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Abstract 

In  decision  making,  an  optimal  point  represents  the  settings  for  which  a  classification  system 
should  be  operated  to  achieve  maximum  performance.  Clearly,  these  optimal  points  are  of  great 
importance  in  classification  theory.  Not  only  is  the  selection  of  the  optimal  point  of  interest,  but 
quantifying  the  uncertainty  in  the  optimal  point  and  its  performance  is  also  important. 

The  Youden  index  is  a  metric  currently  employed  for  selection  and  performance  quantification 
of  optimal  points  for  classification  system  families.  The  Youden  index  quantifies  fhe  correct 
classification  rates  of  a  classification  system,  and  its  confidence  interval  quantifies  fhe  uncertainty 
in  this  measurement.  This  metric  currently  focuses  on  two  or  three  classes,  and  only  allows  for 
the  utility  of  correct  classifications  and  the  cost  of  total  misclassifications  to  be  considered.  An 
alternative  to  this  metric  for  three  or  more  classes  is  a  cost  function  which  considers  the  sum  of 
incorrect  classification  rates.  This  new  metric  is  preferable  as  it  can  include  class  prevalences  and 
costs  associated  with  every  classification.  In  multi-class  settings  this  informs  better  decisions  and 
inferences  on  optimal  points. 

The  work  in  this  dissertation  develops  theory  and  methods  for  confidence  intervals  on  a  metric 
based  on  misclassification  rates,  Bayes  Cost,  and  where  possible,  the  thresholds  found  for  an  optimal 
point  using  Bayes  Cost.  Hypothesis  tests  for  Bayes  Cost  are  also  developed  to  test  a  classification 
systems  performance  or  compare  systems  with  an  emphasis  on  classification  systems  involving  three 
or  more  classes.  Performance  of  the  newly  proposed  methods  is  demonstrated  with  simulation. 
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STATISTICAL  INFERENCE  ON  OPTIMAE  POINTS  TO 


EVAEUATE  MUETI-STATE  CEASSIEICATION  SYSTEMS 

I.  Introduction 

Decision  making  occurs  daily  in  a  vast  range  of  fields,  from  health  care  to  information 
processing  and  military  applications.  Generally,  these  decisions  may  be  based  off  of  classification 
systems  which,  for  example,  label  an  individual  as  diseased  or  not  diseased  or  perhaps  label  an 
object  of  interest  as  a  target  or  non-target.  Although  such  decisions  could  be  made  as  simply  as 
through  a  quick  visual  inspection,  for  many  decisions  of  critical  importance  it  is  of  interest  to 
use  statistics  and  best  practices  to  develop  and  compare  classification  systems  and  quantify  their 
performance  so  as  to  choose  the  best  classification  method  available  to  aid  such  decisions  [68]. 

A  simple  classification  rule  may  classify  an  item  into  one  of  two  classes,  such  as  ’’Positive”  and 
’’Negative”,  or  ’’Diseased”  and  ’’Not  Diseased”.  Although  a  lot  of  research  has  been  conducted  to 
develop  methods  for  the  quantification  of  such  classification  systems,  most  applications  in  the  real 
world  are  more  complicated  and  do  not  fit  into  simple  binary  classification  rules.  Despite  examples 
of  classification  systems  in  most  applications,  this  research  focuses  on  examples  from  a  medical 
diagnostic  standpoint,  as  medical  diagnostics  carry  great  importance  as  well  as  the  possibility  for 
large  consequences  with  respect  to  misdiagnoses. 

One  recent  example  of  a  medical  diagnostic  decision  involves  the  use  of  biomarkers  to  diagnose 
subjects  post  kidney  transplant  as  either  being  normal  kidney  function,  normal  kidney  function  with 
proturina  (a  progression  towards  the  diseased  state),  or  chronic  allofraft  nephropathy  (the  diseased 
state)  [58].  Other  examples  abound  such  as  that  of  HIV  diagnosis.  While  screening  for  this  disease 
by  using  a  specific  biomarker,  patients  can  be  categorized  into  one  of  three  categories:  HIV-negative, 
HIV-positive  non-symptomatic,  and  HIV-positive  with  AIDS  dementia  complex  [45].  Extending 
the  health  concept  to  structures,  we  may  be  interested  in  the  detection  of  the  stage  of  structural 
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damage  as  being  none,  within  a  pre-specified  safety  range,  or  beyond  the  safe  operating  range.  In 
all  of  these  examples  the  middle  class  is  important  as  it  represents  a  state  in  the  progression  of 
some  phenomenon  (e.g.  disease  or  damage).  Thus,  diagnosis  of  the  middle  class  may  allow  for 
intervention  to  prevent  a  subject  or  specimen  from  reaching  the  end  state. 

There  are  methods  available  to  determine  the  performance  of  a  classification  system  requiring 
more  than  two  outcomes.  Many  of  these  methods  use  extensions  of  receiver  operating  characteristic 
(ROC)  curve  theory  for  comparing  classification  systems  on  their  abilities  to  correctly  classify 
objects  [16,  17,  20,  28,  29].  However,  the  number  of  possible  outcomes  is  not  the  only  concern 
when  choosing  a  classification  system.  The  prevalence  of  the  different  classes  as  well  as  the  costs 
associated  with  making  the  correct  (or  incorrect)  decision  should  also  be  considered  [30, 42, 58,  65]. 
For  example,  in  HIV  diagnosis,  different  misclassifications  may  be  considered  more  or  less 
significant.  A  person  who  is  misdiagnosed  as  the  non-diseased  state  when  they  are  actually  HIV¬ 
positive  may  be  considered  much  worse  than  the  opposite  error  occurring  (a  non-diseased  person 
who  is  diagnosed  as  HIV-positive).  In  the  first  scenario,  a  person  will  not  receive  necessary 
medical  intervention  and  may  now  put  others  at  risk  since  they  are  unaware  of  their  HIV-positive 
status.  Clearly  though,  the  latter  misdiagnosis  presents  its  own  cost  in  that  an  individual  may  begin 
treatment  or  otherwise  suffer  wifh  a  diagnosis  fhaf  is  incorrecf. 

In  a  fwo-class  selling,  assigning  a  cosl  lo  fhe  differenl  misclassificalions  is  equivalenl  fo 
assigning  an  associaled  cosl  fo  fhe  differenl  correcl  classificafions.  However,  Ihis  equivalence 
does  nol  universally  exisl  for  sellings  with  three  or  more  classes.  Currently,  little  work  has  been 
done  to  compare  and  quantify  the  performance  of  multi-class  classification  systems  by  using  the 
misclassifications.  By  using  the  misclassifications,  different  costs  may  be  placed  on  all  the  possible 
errors  made  by  the  classification  system  [58,  65]. 

The  work  of  this  dissertation  improves  classification  system  selection  and  performance 
quantification  for  more  complicated  classification  settings  involving  three  or  more  classes  with 
unequal  costs  associated  with  the  different  misclassification  errors.  Specifically,  precision  of 
eslimafes  of  classificalion  sysfem  melrics  and  fheir  opfimal  poinfs  fhrough  confidence  infervals  and 
hypofhesis  lesls  are  explored  fo  aid  decision  makers. 
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II.  Classification  and  Optimal  Performance 


2.1  Classification  System  Families 

A  classification  system  (A)  is  any  process  that  assigns  the  elements  from  k  partitions  of  an 
event  set,  E  =  {Ei,e2,  —,Ek)  k  distinct  elements  of  a  label  set,  L  =  ■  These  partitions 

may  be  referred  to  as  classes.  For  example,  a  two-class  label  set  could  be  {0,1}  or  {Diseased,  Non 
Diseased).  Data  is  collected  on  the  elements,  which  are  then  processed  into  a  feature  or  set  of 
features,  F  =  (/i,/2,  —,fm)  ■  These  features  are  then  used  to  assign  the  different  elements  from  E 
to  the  respective  labels,  L  ,  (A  :  E  ^  F  ^  L)  .  It  is  assumed  that  there  is  a  parameter  or  vector  of 
parameters  for  the  features,  6  €  Q  ,  that  can  be  altered  to  change  the  outcome  of  the  classification 
system.'  Thus,  for  every  0  e  0  ,  there  is  a  classification  system  (Ae)  ,  and  the  set  of  classification 
systems  A  =  (A^^,  0  €  0)  is  called  a  classification  system  family  (CSF)  [58].  It  is  also  assumed 
that  there  exists  a  truth  label  set,  T  =(t\ ,  t2, ...,  tf,) ,  such  that  all  elements  of  the  population  would  be 
correctly  labeled  by  this  set. 

A  two-class  classification  system  has  four  outcomes  with  respect  to  truth  (see  Table  2.1). 
Defining  one  class  as  positive  and  the  other  class  as  negative,  the  possible  outcomes  from  the 
classification  system  are  true  positive,  true  negative,  false  positive,  and  false  negative.  True  positive 
occurs  when  the  system  correctly  classifies  a  positive  element  with  a  ’’positive”  label  (the  rate  of 
true  positive  is  often  called  sensitivity).  True  negative  occurs  when  the  system  correctly  classifies  a 
negative  element  with  a  ’’negative”  label  (the  rate  of  true  negative  is  called  specificity).  These  two 
outcomes  are  correct  classifications.  The  other  two  outcomes  are  misclassifications.  A  false  positive 
occurs  when  the  system  incorrectly  classifies  a  negafive  element  with  a  ’’positive”  label.  Likewise, 
a  false  negative  occurs  when  the  system  incorrectly  classifies  a  positive  element  with  a  ’’negative” 
label.  The  results  of  a  classification  system  are  often  arranged  in  a  contingency  table  as  seen  in 
Table  2.1  with  the  truth  along  the  columns  and  the  classification  results  down  the  rows. 

'These  parameters  will  generally  be  referred  to  as  the  thresholds  for  the  classification  system. 
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Table  2.1:  Two-class  contingency  table  where  green  cells  correspond  to  correct  classifications  and 
red  cells  correspond  to  misclassifications. 


TRUTH 

Positive 

Negative 

1  CLASSIFICATION  | 

“Positive” 

True  Positive 

False  Positive 

“Negative” 

False 

Negative 

True  Negative 

An  example  classification  system  in  a  medical  diagnostic  setting  may  have  elements  in 
partitions  of  the  event  set,  E  =(Non-Diseased,  Diseased),  and  the  label  set,  L  =(”Non-Diseased”, 
’’Diseased”).  After  the  collection  of  data  such  as  a  patient’s  blood  sample,  the  feature  extracted 
might  be  the  value  of  a  specific  biomarker  determined  from  the  blood  sample,  F  =(biomarker  level, 
/imol).  Then  a  single  threshold,  0  €  0  ,  is  determined  so  that  whenever  the  observed  biomarker 
level  is  less  than  6  ,  the  patient  is  labeled  as  ’’Diseased”,  and  whenever  the  biomarker  level  is 
greater  than  6  ,  the  patient  is  labeled  as  ”Non-Diseased”  (see  Figure  2.1).  For  instance,  when  total 
cholesterol  (a  biomarker  feature)  is  greater  than  240  (the  threshold),  a  patient  may  be  labeled  with 
’’high  cholesterol”. 

In  the  two-class  case,  there  are  two  correct  classifications  and  two  misclassifications.  In  the 
k-class  case  there  are  k  correct  classifications  and  -  k  misclassifications.  When  there  are  more 
than  two  classes,  the  correct  and  misclassifications  can  no  longer  be  defined  as  true  positive  or  false 
negative.  Therefore,  these  terms  are  generalized  to  correct  classifications  and  misclassifications. 
For  simplicity  of  notation,  the  outcomes  are  labeled  i  \  j  ,  where  j  is  the  true  label  for  an  element 
and  i  represents  the  classification  system  label  for  an  element,  i ,  j  =  1,2, ...,  k  .  Then  for  all  i  -  j  , 
the  outcome  is  a  correct  classification  and  for  all  j  ,  the  outcome  is  a  misclassification  (see  Table 
2.2  for  k  =  3). 
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Threshold  (0)  value  chosen 
to  label  elements  based 
on  biomarker  value 


Distribution  of  feature 
(biomarker)  values  for 
diseased  individuals 


Distribution  of  feature 
(biomarker)  values  for 
non-diseased  individuals 


All  elements  labeled  as  "Non-Diseased" 


Figure  2.1:  Example  of  a  classification  system  in  a  medical  setting  where  elements  are  either 
diseased  or  non-diseased.  Hypothetical  feature  distributions  for  each  class  and  a  potential  threshold 
(green  line)  used  to  label  the  elements  as  either  ’’Diseased”  or  ”Non-Diseased”  are  shown. 


Table  2.2:  Three-class  contingency  table  where  green  cells  correspond  to  correct  classifications, 
i  =  j,  and  red  cells  correspond  to  misclassifications,  i  +  j  . 


TRUTH 

CLASS  1 

CLASS  2 

CLASS  3 

1  CLASSIFICATION  | 

“CLASS  1” 

1|1 

1|2 

1|3 

“CLASS  2” 

2|1 

2|2 

2|3 

“CLASS  3” 

3|1 

3|2 

3|3 

2.2  Receiver  Operating  Characteristic  Curves 

Receiver  operating  characteristic  (ROC)  curves  are  used  to  describe  the  performance  of  a  CSF 
when  there  are  two  classes  (See  Figure  2.2).  The  ROC  curve  plots  the  true  positive  rate  versus  the 
false  positive  rate  over  all  threshold  values,  6  e  &  .  This  curve  allows  for  interpretation  of  the  trade¬ 
off  between  the  true  positive  and  false  positive  rates  for  varying  thresholds.  Thus,  the  ROC  curve 
represents  the  performance  of  the  entire  CSF  for  all  0  €  0  . 
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The  ROC  curve  plots  classification  rates,  bounding  the  curve  between  0  and  1  on  both  the 
horizontal  and  vertical  axes.  The  point  on  the  ROC  curve  that  represents  perfect  classification  is 
(0,1)  (Figure  2.2).  This  point  represent  a  perfect  true  positive  rate  (1)  and  a  perfect  false  positive  rate 
(0).  Therefore,  CSFs  whose  ROC  curve  approach  this  point  are  desired  and  the  single  classification 
system  closest  to  this  point  is  optimal.  In  a  two-class  setting,  the  probability  of  correctly  classifying 
due  to  random  chance  is  0.5.  The  line  on  the  ROC  plot  that  corresponds  to  chance  classification  is 
called  the  chance  line  and  intersects  the  points  (0,0)  and  (1,1)  (Figure  2.2)  [19].  A  CSF  performing 
worse  than  random  chance  would  not  be  of  interest,  and  therefore  only  CSFs  whose  ROC  curves 
lie  above  the  chance  line  are  usually  considered.  Finally,  when  there  are  more  than  two  classes,  the 
ROC  curve  may  be  extended  to  a  ROC  surface  by  plotting  the  correct  classification  rates  over  all 
0  €  0  in  a  ^  dimensional  space,  though  only  the  3-dimensional  surface  is  visible  graphically. 


Figure  2.2:  Receiver  Operating  Characteristic  Curve. 


2.3  Optimal  Points 

The  single  classification  system  resulting  in  the  best  classification  performance  for  the  CSF  is 
said  to  occur  at  the  optimal  point  (or  points),  corresponding  to  some  0  €  0  .  For  a  two-class  system, 
the  optimal  point  is  usually  found  where  the  probability  of  a  true  positive  and  the  probability  of  a  true 
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negative  are  maximized  (maximization  of  correct  classification  probabilities),  or  equivalently,  where 
the  false  positive  and  false  negative  probabilities  are  minimized  (minimization  of  misclassification 
probabilities).  Therefore,  the  optimal  point  reflects  a  compromise  between  the  correct  classification 
probabilities  (or  misclassification  probabilities)  [42].  The  optimal  point  for  a  two-class  CSF  can 
be  found  using  the  ROC  curve.  If  the  prevalence  of  classes  and  costs  associated  with  classification 
outcomes  are  considered  equal  for  both  classes,  the  optimal  point  occurs  where  the  tangent  line  to 
the  ROC  curve  is  parallel  to  the  chance  line  (ie.  the  slope  of  the  ROC  curve  is  1)  [42].  This  is 
equivalent  to  finding  fhe  poinf  on  fhe  ROC  curve  wifh  fhe  greafesf  vertical  disfance  from  fhe  chance 
line  [54].  The  fhreshold  value(s)  fhaf  produce  fhis  poinf  are  fhen  chosen  as  fhe  optimal  fhreshold 
values  for  fhis  CSF. 

Exfensive  work  in  fhe  liferafure  suggesfs  fhaf  cosfs  associafed  wifh  a  classificafion  sysfem’s 
oufcomes  should  be  faken  info  accounf  when  evaluafing  fhe  sysfem  and  esfimafing  opfimal 
fhresholds  [1,  30,  42,  58,  63-65,  67].  In  addition  fo  fhe  cosfs  of  fhe  classificafion  oufcomes,  fhe 
prevalence  of  fhe  differenf  classes  may  be  of  imporfance  when  defermining  opfimal  sellings  for 
a  CSF  [9,  42].  If  fhe  a  priori  prevalence  of  fhe  diseased  and  non-diseased  (or  largel  and  non- 
targel)  classes  as  well  as  fhe  a  priori  cosfs  associated  wifh  fhe  decision  oufcomes  are  faken  info 
consideration,  fhe  CSF  may  have  a  differenf  opfimal  poinf  (see  Figure  2.3)  [19,  58,  67].  When 
prevalence  and  cosfs  are  considered,  fhe  opfimal  poinf  occurs  on  fhe  ROC  curve  where  fhe  slope  is 
equivalenf  lo 


S  lope  = 


1 


■pp 


PP 


cpp  -  ctn 


CpN  -  CtP 


(2.1) 


[42].  The  Pp  is  fhe  prevalence  of  fhe  positive  class,  cpN  is  the  cost  of  a  true  negative,  cpp  is  the  cost 


of  a  false  positive,  cpp  is  the  cost  of  a  true  positive,  and  cfn  is  the  cost  of  a  false  negative.  Under 
the  assumption  of  equal  prevalences  and  equal  costs  of  misclassification  (or  correct  classification). 


this  slope  is  equal  to  one  as  expected. 


The  optimal  point  for  a  k-class  classification  system  will  usually  correspond  to  at  least  k  -  \ 
threshold  values.  For  example,  in  order  to  classify  subjects  into  three  categories  (HIV  negative 
(NEC),  HIV  positive  non-symptomatic  (NAS),  and  HIV-positive  with  AIDS  dementia  complex 
(ADC)),  two  threshold  values  {9\  <  82)  on  a  biomarker  (NAA/Cr)  may  be  used  as  a  diagnostic 
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Figure  2.3:  Different  optimal  points  (in  red)  for  the  same  CSF,  determined  by  Equation  2.1.  The 
orange  line  has  a  slope  of  one,  representing  equal  class  prevalence  and  costs  associated  with  the 
classifications.  Both  the  green  and  blue  lines  assume  a  positive  class  prevalence  of  1/3.  The  blue 
line  has  a  slope  of  1/6  with  CfN»Cfp.  The  green  line  has  a  slope  of  2  with  cpp  =  cfn  ■  For  o^ch 
line  cjp!  =  cjp  =  1  . 


test  [45].  If  a  subject’s  NAA/Cr  level  is  below  9\  they  are  classified  as  ADC,  if  the  subject’s  NAA/Cr 
level  is  between  9\  and  02  they  are  classified  as  NAS,  and  finally  if  the  subject’s  NAA/Cr  level  is 
greater  than  02  they  are  classified  as  NEC  [45]  (see  Eigure  2.4). 

2.4  Metrics  for  Optimal  Points 
2.4.1  The  Youden  Index. 

The  Youden  index  (7)  was  first  introduced  by  W.  J  Youden  in  1950  as  an  index  for  rating 
diagnostic  tests  (or  classification  systems)  with  two  classes  [76].  The  Youden  index  has  been  shown 
to  be  a  useful  metric  for  measuring  a  classification  system’s  performance  as  a  function  of  the  correct 
classification  probabilities  [23,  45,  46,  50,  56,  76].  In  a  two-class  framework,  this  index  is  defined 


Classified  Classified  Classified 

as  ADC  as  NAS  as  NEC 


Figure  2.4:  Three-class  classifications  for  HIV  example.  Distributions  of  the  NAA/Cr  levels  are 
plotted  for  ADC  (black),  NAS  (red),  and  NEG  (blue)  as  well  as  potential  threshold  values,  9\  and 
02  ,  used  to  determine  a  subject’s  classification. 


as  the  sum  of  the  system’s  specificity  (true  negative  rate)  and  sensitivity  (true  positive  rate)  minus 
one.  Using  J,  the  optimal  point  of  the  classification  system  is  found  by  choosing  the  threshold(s), 
6  e  Q  ,  that  maximize  J,  thereby  maximizing  the  correct  classification  probabilities.  The  thresholds 
associated  with  the  maximum  J  characterize  the  CSF  at  its  optimal  performance  (with  respect  to 
correct  classification)  and  correspond  to  the  optimal  point  on  the  ROC  curve  where  the  slope  is 
equal  to  one.  Therefore,  classification  systems  can  be  compared  by  calculating  /: 

J  =  ma.x{sensitivity{6)  +  specif  icity{6)  -  1}  (2.2) 

0e& 

A  classification  system  which  performs  worse  than  chance  is  generally  not  of  interest,  and  therefore 
it  is  assumed  that  both  sensitivity  and  specificity  are  bounded  between  0.5  and  1.  For  this  reason, 
J  =  sensitivity{6)  +  specif  icity{6)  -  1  is  bounded  between  0  and  1  for  systems  performing  better 
than  chance  [76]. 
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Costs  associated  with  the  different  classifications  as  well  as  class  prevalence  may  be  of 
importance  in  the  determination  of  J.  In  fact,  when  not  explicitly  considering  a  cost  structure 
when  using  J,  a  cost  and  prevalence  for  each  class  is  being  assumed,  that  of  equal  weight  for  all 
classes  [55,  64].  Other  costs  may  be  considered  by  using  a  generalization  to  J  which  incorporates  a 
cost  benefit  ratio  weighted  by  class  prevalence  in  the  two-class  framework  [30,  63].  The  generalized 
Youden  index  (GYI)  for  two  classes  is  defined  as 


GYl  -  max  i  sensitivity{6)  +  — —  x 
06©  [  pp 

where  G  is  a  consfanf  defermined  by  fhe  prevalence  of  fhe  positive  class  and  fhe  cosfs  associafed 
with  the  different  decisions  [30,  40,  63] .  Notice  that  the  prevalence/cost  multiplier  is  the  same  as  in 
Equation  2.1 

When  there  are  more  than  two  classes,  J  is  extended  as  the  sum  of  the  k  correct  classification 
probabilities  [45, 46].  Under  this  framework,  the  correct  classification  probabilities  can  no  longer  be 
distinguished  by  sensitivity  and  specificity,  so  instead,  the  k  correct  classification  probabilities  are 
labeled  as  Pi^j\j{6) ,  where  j  =  l,...,k  denotes  the  true  class  and  /  =  1, ...,  k  denotes  the  classification 
system’s  labeled  outcome.  Then  J  is  redefined  as 

k  k 


Cpp  -  CtN 


CfN  -  Cpp 


X  specif  icity{9)  -  G 


(2.3) 


J  =  max 
060 


Z  Z 


‘=\  y=i 
L'=; 


(2.4) 


J  is  generalized  by  adding  a  mulfiplier  (prevalence  and/or  ufilify)  fo  each  correcf  classification 
probabilify  for  classification  sysfems  with  three  or  more  classes  [45  ,  46].  The  limitation  with 
such  an  extension  is  that  only  costs  of  the  total  misclassification  and  (utility  of  the)  total  correct 
classification  outcomes  within  each  class  are  used.  This  ignores  possible  different  costs  on  class 
specific  misclassificafions.  For  example,  misclassifying  sfage  3  cancer  as  sfage  2  may  have  a 
differenf  cosf  fhan  classifying  sfage  3  as  sfage  1 . 

Extensive  work  has  derived  formulas  for  determining  J  and  the  optimal  threshold(s)  for  CSFs, 
under  various  distributional  assumptions  of  the  feature  used  for  classification,  and  focused  on  the 
two-class  framework  [23,  33,  49,  54,  56].  An  overview  of  these  results  are  given  in  the  following 
sections,  and  are  separated  into  parametric  and  nonparametric  methods. 
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2.4.1. 1  Parametric  Methods. 

Assume  two  classes  and  a  single  feature  used  for  classification  where  the  feature  is 
independently  and  normally  distributed  for  each  class,  where  X  is  the  first  class  and  Y  is  the  second 
class,  denote  Xj  ~  N{jj.\,cr\)  for  7=1, ...,  n\  ,  Yi  ~  N(jU2,  cr^)  for  /  =  1, ...,  n2  ,  and  without  loss  of 
generality  (WLOG)  let  //i  <  ^2  (see  Figure  2.1).  Recall  that  the  probability  distribution  function 
(pdf)  for  the  normal  distribution  is 

fiw  \^i,o-)  =  ——  -e  2rr-  —  00  <  vv  <  00,  —  00  <  ju  <  00,  (T  >  0  (2.5) 

ylno- 

Then  the  Youden  index  may  be  written  as 


/  -  d) 


(72 


+  d) 


o-\ 


(2.6) 


where  d)  is  the  normal  cumulative  distribution  function  (CDF)  [56].  Here,  the  maximum  is  excluded 
because  the  optimal  threshold  6*  is  used.  The  closed  form  solution  for  the  optimal  threshold,  0*  €  0  , 
which  maximizes  Equation  2.6  is  given  by 


Q* 


-  Y)  -  a  +  b  +  2{b^  -  l)cr^  \n{b) 

(b^  -  1) 


(2.7) 


where  a  =  ^2  -  and  ^  [56].  If  cri  -  cr2  ,  this  result  does  not  exist,  but  for  this  case  the 
optimal  point  is  the  midpoint  between  the  distribution  means  given  by  [56] : 

0*  -  (2.8) 


The  GYI  may  also  be  rewritten  using  the  normal  CDF: 


GYl  =  0 


7/2-0* 

0-2 


+  R  X  d) 


0*-7/i 

CTl 


-G 


where 


D  ^~Pp 

R  - - -  X 

Pp 


cfp  -  CtN 


CfN  -  CtP 


(2.9) 


(2.10) 


Again,  the  maximization  is  excluded  because  this  equation  is  being  evaluated  at  the  optimal 
threshold.  Accounting  for  fixed  class  prevalences  and  costs  associated  with  the  classification 
outcomes,  the  optimal  threshold  when  cr^  =  cr^  is 

2cr^  ln(R)  - 


0*  = 


2(//2  -//i) 


(2.11) 
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(2.12) 


[30] .  When  cr^  +  (t\  the  optimal  threshold  is 


e* 


-  1)  -  a  +  b  +  IQP-  -  V)(t\  ln(/?  x  b) 

{b^  -  1) 


where  a  =  yU2  -  yUi  and  b  -  ^  [63]. 

When  there  are  three  classes,  there  is  an  additional  class,  Z,  where  Z,„  ~  N(jis,o-^)  for 
m  =  l,...,?i3  .  J  is  then  defined  as  the  sum  of  the  three  correct  classification  probabilities  and 
can  be  expressed  using  the  normal  CDF  as 


J  =  0|  - 1  +  0' 


cr\ 


0-2 


O 


e\^ 

0-2 


0-3 


(2.13) 


where  ff[  <  6^  are  the  optimal  thresholds  found  to  maximize  J  [45].  The  solutions  for  these  optimal 
thresholds  can  be  found  with  Equation  2.7  where  the  solution  for  6*  is  found  with  a  =  H2  -  l^i  and 
b  =  ^  ■  The  solution  for  9^  is  found  similarly  with  a  =  -  92  and  b  =  ^  [36] .  Although  the 

GYI  has  not  been  extended  for  three  classes,  in  [45]  the  three-class  J  is  generalized  with  weights  on 
each  correct  classification  probability.  Therefore,  weights  could  be  added  to  Equation  2.13  and  the 
optimal  thresholds  (0*  <  would  be  found  numerically. 

Einally,  for  all  forms  of  J  and  GYI,  if  the  classification  feature  is  distributed  log-normally,  the 
point  estimate  of  the  threshold  is  determined  using  log-transformed  data.  A  similar  development 
is  presented  in  [56]  for  J  with  two  classes  and  a  gamma  distributed  feature.  However,  for  features 
distributed  within  the  Box-Cox  family,  transformations  to  normality  may  be  used  and  the  formulas 
assuming  normality  applied  [30,  45]. 

2.4.1.2  Nonparametric  Methods. 

Eor  any  number  of  classes,  if  no  distributional  assumptions  about  the  feature  used  for 
classification  are  made,  J  can  be  defined  using  the  empirical  CDE.  The  empirical  CDE,  E„(x)  , 
of  a  random  sample  of  size  n  is  defined  as 


Fnix)  -  -  V  I{Xi  <  X) 

n 

Z=1 


(2.14) 


where  7  is  fhe  indicator  function  and  is  equal  to  1  if  the  relation  is  true,  and  0  otherwise  [32].  Eor 
example,  in  a  three-class  scenario  {X  <  Y  <  Z),  J  may  be  defined  as 


J  =  F{er^)  +  0(6*2)  -  G(ei)  -  77(02)  + 1 


(2.15) 
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where  F{9)  =  ^  ^  >  ^{9)  =  ^  Y%,  ^  ^)  >  ^  ^  ^m=i  ^  ^)  > 

9\  and  0^  are  the  thresholds  found  to  maximize  Equation  2.15  [45].  Methods  that  have  been  used 
to  determine  the  optimal  thresholds  include  a  smoothing  kernel  method  on  the  empirical  CDFs, 
choosing  the  observations  where  the  maximum  occurs,  or  by  random  walks  [45,  63]. 

All  forms  of  J  presented  may  be  extended  for  the  k-class  J,  where  again,  weights  may  be 
placed  on  the  correct  classification  probabilities  to  incorporate  the  importance  of  the  different  correct 
decisions  in  finding  fhe  optimal  poinf  [46].  Ofher  work  on  J  includes  consideration  of  special 
cases  such  as  pooled  samples,  correcfions  for  measuremenf  error,  and  mefhods  for  when  fhe  fealure 
disfribufion  has  a  mass  af  zero  [49,  54,  55]. 

2.4.2  Bayes  Cost. 

The  opfimal  fhreshold  found  by  maximizing  fhe  correcf  classificafion  probabilifies  (via 
J)  is  equivalenf  fo  fhaf  found  by  minimizing  fhe  misclassificalion  probabilifies  in  a  fwo-class 
framework  [6,  50].  When  fhere  are  more  fhan  fwo  classes  and  unequal  cosfs  associafed  wifh 
fhe  misclassificafions  wifhin  each  class,  fhe  equivalence  befween  optimal  fhresholds  found  by 
maximizing  correcf  classificafion  probabilifies  and  minimizing  misclassificafion  probabilifies  is  nof 
universally  frue.  This  is  because  if  is  no  longer  feasible  fo  assign  a  simple  cosf  benefil  ratio  befween 
fhe  benefif  of  making  a  correcf  decision  and  fhe  cosfs  of  making  an  incorrecf  decision  [58,  65]. 
Therefore,  finding  fhe  opfimal  sellings  can  be  more  complex  when  a  classificafion  syslem  has  more 
fhan  fwo  classes.  In  order  fo  assign  differing  cosfs  or  benefils  fo  fhe  polenfial  oulcomes  of  a  k-class 
classificafion  syslem,  a  mefric  lhal  considers  all  differing  misclassificafion  probabilities  should  be 
considered  instead  of  exlensions  of  J. 

A  k-class  classificafion  system  resulls  in  a  folal  of  correcf  classificafion  and  misclassificafion 
probabilifies;  however,  J  only  uses  k  pieces  of  information  {k  correcf  classification  probabilifies). 
Therefore,  by  using  J,  k^  -  k  pieces  of  informafion  abouf  fhe  classificafion  syslem  may  be  losl, 
namely  informafion  abouf  fhe  class-specific  misclassificafions.  A  mefric  developed  on  fhe  k^  -  k 
error  probabilifies  will  lose  no  informafion  abouf  fhe  system  [58]  (see  Theorem  1). 

For  Ibis  reason,  fhe  developmenl  of  a  mefric  associafed  wifh  fhe  misclassificalion  probabilifies 
is  of  inferesl.  Bayes  Cosf  {BC)  is  a  mefric  presented  in  [65]  fhaf  minimizes  misclassificalion 
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probabilities  for  three  or  more  elasses.  This  metrie  allows  for  miselassifieation  probabilities  to 
be  weighted  by  the  eost  and  elass  prevalenee  assoeiated  with  eaeh  miselassifieation  outeome. 


Bayes  Cost  =  min 
6»e0 


'=1  7=1 


(2.16) 


where  ay  is  the  fixed  eost  assoeiated  with  miselassifying  elass  j  as  elass  i  and  pj  is  the  fixed 
prevalenee  for  the  elass.  Therefore,  BC  allows  for  the  use  of  any  eost/prevalenee  strueture  on 
both  the  eorreet  and  miselassifieation  probabilities. 


Theorem  1.  Using  Bayes  Cost  to  determine  the  optimal  thresholds  of  a  multi-state  classification 
system  allows  for  the  use  of  any  cost/prevalence  structure  on  any  of  the  correct  or  miselassifieation 
probabilities,  therefore  not  losing  any  information  about  the  classification  system. 


Proof.  Let  the  prevalence  of  the  class  be  denoted  pj  and  the  cost  of  a  miselassifieation  be  nii^jy 
or  benefit  of  a  correct  classification  be  bi=j\j,  where  the  true  class  is  denoted  j  =  1,2,  ...,k  and 
classification  outcomes  are  denoted  i  -  1,2, ...,  k.  The  cost  function  to  minimize  would  be 

\  k  k  k  k  1 


Cost  =  min 
9e& 


!=1  j=l  i=l  j=\ 


(2.17) 


Note,  since  the  classification  outcomes  in  each  class  are  mutually  exclusive  and  the  sample  size  of 
each  class  (nfi  is  fixed: 

k 

^^Pi\j{6)  =  I,  for  each  j  -  1,2,  ...,k  (2.18) 

i=l 


which  implies 


Pi=j\m  - 


k 

1  -  ^  Pi^j\j{d),  for  each  j  =  \,2,...,k 
i=\ 


(2.19) 
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Substituting  Equation  2.19  in  Equation  2.17  gives 


Cost  = 


min^ee 

min^ee 

min^ee 

min^ee 

min^ee 

min^se 

min^ee 

min^ge 


i=Yj=l  i=ij=l 

Yj  Y  Pi^b*j\iPi*j\A^) Y  Y  Pj^i=j\j{k  ~ 

i=lj=l  i=lj=l 

Y  Y  Pj^i*j\jPi*j\j((^)  Y  Y  {T^i=j\j  ~  Pj^i=j\jPi*j\j((^)) 

1=1  7=1  1=1  7=1 

k  k  k  k 

Y  Y  Pjmi^j\jPii.jjiO)  -  Y  Y  Pjbi=j\jPi^j\j{e)  +  constant 

1=1 7=1  i'=l7=l 

k  k 

Y  Y  Pj^i+j\iPi*j\A^)  ~  /^7^i=7l7-^i?^7l7(^) 

1=1  7=1 

Y  Y  Pj  ("^!5t7l7-^'^7l7(^)  “  ^'=7l7-^i^7l7(^)) 

1=1  7=1 
k  k 

Y  Y  Pj  {p^i*i\i  ~  ^1=717)  ^i*j\j^^^ 

1=17=1 


+  constant 

+  constant 

+  constant 


Y  Y  Pj(^i\jPi\ji(^) 

i= 17^7 7=1 


+  constant,  where  Ci\j  =  -  bi=j\j 


(2.20) 


=  Bayes  Cost  +  constant 

n*  _  n* 

^Cost  ~  ^BC 

This  demonstrates  that  the  optimal  thresholds  found  by  minimizing  Bayes  Cost  are  equivalent  to 
those  found  by  minimizing  a  function  which  uses  all  classification  outcome  probabilities  from  the 
classification  system,  allowing  for  any  cost/benefit  and  prevalence  structures  to  be  considered.  □ 


Assume  a  three-class  classification  system  with  a  single  feature  used  for  classification  that  is 
independently  and  normally  distributed  for  each  class,  where  pi  <  p2  <  pj,  .  Under  this  framework, 
BC  can  be  expressed  with  the  standard  normal  CDF  and  the  optimal  thresholds  that  distinguish 
between  the  classes  and  minimize  BC,  ff[  <  >  as: 


BC3  ^  C2|lPl  X  O 


ei-pi 

o-\ 


^1^*1- Pt\\ 

0|-^ - 11 -1-C311P1  X  O' 


0-1 


0-1 


'0*-p2\\  (  (P2-ff2 

+  Ci\2P2  X  (  O  (  — 1 1  -I-  C312P2  X  O  ' 


0-2 


0-2 


f9*.-P3\\  (  ie*-p3 

+  C1I3P3  X  (  O  (  — - - 1 1  -I-  C2\3P3  X  O  ' 


0-3 


0-3 


o 


0*1  -P3 
0-3 


(2.21) 


The  minimization  is  not  expressed  in  Equation  2.21  as  this  is  achieved  by  using  the  optimal 
thresholds.  The  optimal  thresholds  must  be  found  numerically  when  all  ctypj  are  not  equal,  for 
i  +  j  .  When  all  Ci\jPj  are  equal,  for  i  j  ,  Equation  2.7  may  be  used  to  find  fhe  optimal  fhreshold 
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between  each  set  of  normal  distributions.  Equation  2.21  may  be  extended  for  any  k  classes  with  a 
single  feature  used  for  classification  that  is  independently  and  normally  distributed  for  each  class, 
and  would  require  k  -  \  optimal  thresholds. 

When  there  are  two  classes,  the  optimal  threshold  found  by  minimizing  BC  is  equivalent  to  that 
found  by  maximizing  the  GYI,  Equations  2.11  or  2.12  (assuming  the  same  costs  and  prevalences 
used  to  find  the  optimal  point).  A  proof  of  this  equivalence  is  given  in  Section  2.5.2.  Also,  if  all 
Ci\jPj  are  equal,  for  i  j  ,  the  optimal  threshold(s)  found  with  BC  would  be  equivalent  to  those 
found  by  maximizing  J. 

In  a  nonparametric  setting,  BC  can  be  estimated  using  the  empirical  distribution  function. 
Eetting  6  =  {6^  <  62  <  ■■■  <  6k)  and  F j  be  the  empirical  CDE  for  the  class  with  F j-\  <  Fj  ,  for 
all  k  classes,  BC  is  defined  as 


BC  =  min 
6/60 


k  k 

Y,T,wlFjiOi)-Fjiet-i)] 


i=\  7=1 
t*J 


(2.22) 


where  Fj{9o)  =  0  and  F j(6k)  -  1  [65].  The  optimal  thresholds  are  then  found  to  be  those  which 
minimize  Equation  2.22. 


2.5  Confidence  on  Optimal  Point  Metrics 

It  is  critical  to  characterize  the  uncertainty  in  an  optimal  point,  as  such  estimates  are  typically 
constructed  from  data.  This  is  most  commonly  accomplished  by  creating  confidence  intervals 
(CIs)  around  the  metric  used  to  characterize  the  optimal  point  (Youden  index,  Bayes  Cost,  etc) 
as  well  as  creating  confidence  interval(s)  around  the  threshold(s)  which  correspond  to  the  optimal 
point  [30,  33,  49,  56,  76]. 

CIs  are  a  statistical  inference  method  that  provide  a  range  of  values  (usually  an  interval)  for 
which  there  is  a  specified  level  of  confidence  that  the  true  parameter  lies  within  the  interval.  CIs 
may  be  constructed  as  either  one  or  two  sided  (one  sided  being  of  the  form  where  there  is  either 
a  lower  or  upper  bound,  but  not  both).  This  work  focuses  on  constructing  two  sided  confidence 
intervals.  If  X  =  {Xi, . . .  ,X„)  is  a  random  sample,  then  L(X)  and  U (X.)  form  a  confidence 
interval  with  confidence  coefficient  1  -  a  for  some  function  of  the  parameter  6  ,  t{6)  ,  such  that 
P[L(X)  <  t(0)  <  U  (X)]  =  \  -  a  [12,  p.  417], [44,  p.377].  Because  it  is  known  that  the  upper 
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and  lower  bounds  of  the  Cl  are  functions  of  the  observed  data,  the  notation  for  the  bounds  may  be 
simplified  by  writing  L(X)  as  t{G)l  and  U  (X)  as  t{9)u  ■ 

Not  all  CIs  perform  equally  well.  An  interval’s  coverage  probability  and  length  are  metrics 
of  a  CFs  performance.  If  a  Cl  with  a  confidence  coefficienf  of  1  -  a  is  consfrucfed  100  limes,  if 
is  expecled  lhal  (1  -  a)  100%  of  fhe  inlervals  aclually  confain  fhe  frue  parameler  of  inferesf.  This 
may  nol  always  be  fhe  case,  and  fhe  percenf  of  consfrucfed  CIs  lhal  confain  fhe  frue  parameler  is  fhe 
coverage  probabilily  of  fhe  Cl.  The  coverage  probabilily  should  be  al  leasl  (1  -  a)  100%  for  a  well 
performing  Cl.  CIs  wilh  coverage  probabilily  greater  lhan  (1  -  a)  100%  are  considered  conservative. 

For  all  CIs  thaf  meel  fhe  desired  coverage  probabilily,  if  is  Ihen  of  inferesf  lo  find  fhe  inlerval 
wilh  fhe  shorlesl  lenglh.  The  lenglh  of  an  interval  is  defined  as  t(0)[/  -  t{6)i  .  A  shorler  lengfh  Cl 
which  meels  fhe  desired  coverage  probabilily  provides  a  more  precise  (and  Iherefore,  arguably,  a 
more  useful)  eslimale  of  fhe  parameler.  Anolher  melric  of  Cl  performance  is  ils  symmelry,  which 
may  be  used  lo  judge  whelher  or  nol  fhe  frue  parameter  of  inferesf  lies  in  fhe  center  of  fhe  interval, 
or  if  fhe  inlerval  is  skewed  fo  one  side.  Mean  squared  error  and  bias  of  fhe  parameler  eslimale  may 
impacl  Cl  performance  and  are  Iherefore  also  sometimes  considered,  Ihough  Ihese  are  nol  properties 
of  fhe  inlerval  ilself. 

2.5.1  Confidence  on  the  Youden  Index  and  Optimal  Thresholds. 

Several  mefhods  exisl  in  fhe  liferafure  for  consfrucfing  CIs  around  J  and  fhe  opfimal 
Ihreshold(s),  mainly  in  a  Iwo-class  selling.  In  addilion  fo  Ihese  mefhods,  boolslrap  mefhods  are 
also  applicable,  as  boolslrap  CIs  are  a  general  and  flexible  melhod  lhal  may  be  used  under  any 
dislribufional  assumplions  of  fhe  fealures  and  classificalion  sysfem  slruclure.  Firsl,  paramefric  Cl 
mefhods  are  presenled  and  following  Ihese  mefhods,  fhe  nonparamelric  Cl  mefhods  available  for  J 
are  presenled. 

A  della  melhod  approximalion,  which  uses  firs!  order  Taylor  series  expansions  lo  determine 
fhe  variance  of  J  and  fhe  optimal  Ihreshold(s),  has  been  implemented  fo  creafe  CIs  for  J  and  fhe 
resulfing  optimal  Ihreshold(s)  for  a  classificalion  sysfem  wilh  Iwo  or  three  classes  with  a  single 
feature  that  is  independently  and  normally  distributed  for  each  class  [36,  56,  63,  64].  The  delta 
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method  (1  -  a)  100%  Cl  around  J  is 


J  ±  Zajl 


(2.23) 


where  J  is  estimated  using  Equation  2.6  for  two  classes  or  Equation  2.13  for  three  classes  and 
Var(^J  ^  is  approximated  with  the  delta  method  as: 


Var{j)^Yj 

;=i 


dJ\  -  (  dJ  ,  „ 


(2.24) 


The  covariance  term  from  the  delta  method  approximation  is  zero,  due  to  the  assumption  of 
independence  between  the  classes’  feature  distributions. 

Assuming  two  classes  and  a  normally  distributed  feature,  Xj  ~  N(jui,cr^)  for  j  =  , 

Yi  ~  N{p.2,o-l^  for  i  =  1,  ...,n2  ,  and  fii  <  1^2  ,  Var(^J  ^  in  Equation  2.24  is  estimated  by: 


Vari^J  ^ 


v2 

ll. 

n2 


0(?2)  +  (i^(?l)  -  0(?2)) 


-1  +ab{rad)-^/^ 


+  (-l)|  — 

til 


-  1 


-.2 


0(?l)  +  (0(?i)  -  0(?2)) 


I?  +  {-l)ab(rad)-^^^ 
b^  -  1 


1 


2(^2  -  1) 


^20fe)  + 

x{2dlP'  +  {(-P'  -  l){rady^^ 

+{Sh(P  -  \){rad)-^^^(ln(P)  +  1 


-1 


+ 


2(ni  -  1) 


^  )  +  WCDzM 

ZWKZV  -r  (^2_i)2(52)1/2 
x{2aP  +  ({P?  -  l){rady^^ 
+{S^)(P  -  l)(rad)~^^^(ln(P)  +  P  -  1))) 


(2.25) 


where  Z2  -  ,  Zi  =  P=  ,  a  -  y  -  x  ,  b  =  ^  ,  rad  =  +  {P  -  \^  S\\n  {P^  ,  and  (f>  represents 

y^2  ■yS  J  1 

the  standard  normal  pdf  [56].  A  similar  formulation  of  the  approximation  of  Var  [j  ^  is  used  for  the 
delta  method  Cl  for  the  three-class  J. 

The  (1  -  a)  100%  CI(s)  for  the  optimal  threshold(s)  (two  or  three  classes  with  a  normally 
distributed  feature)  is  given  by 

e*  ±  Zaii  ylvarp)  (2.26) 


18 


where  6*  may  be  found  with  Equation  2.7  (for  either  optimal  threshold  by  considering  the 
appropriate  adjacent  classes)  [36,  56].  Using  the  delta  method,  the  variance  of  9*  is  approximated 
with 


de* 


89* 


d(T\ 


Var{9*)xl- —  Var(}2i)  +  \- —  Var{o-i)  +  \- —  Var(ji2)  +  \^ —  Varip-i)  (2.27) 


8lii 


89*  \ 


8(72 1 


The  partial  derivatives  required  for  this  approximation  are 


89*\  _b^  +  ab(rady^/^{-\) 

8pi )  b^  -  I 

89*\  _b^  +  ab{rad)-^l\-l) 


5/12/ 

89f 

8(71 1  (/,2  _  1)2 


b^  -  1 

2ab^  bQp'  +  l)(ra(/)'^^  (7\b{rad)~^^^ 


89*  \ 


-  = 


-2ab^ 


8(72)  (/72  _1)Vi 


+ 


+ 


(/?2-l)Vi  52-1 

{-b^  -  \){rad)^^^  (72b{rad)~^^^ 


+ 


b^  -  1 


[ln(b^)  +  b^  -  l) 

(ln(/j2)  +  1  -  /,-2) 


(2.28) 

(2.29) 

(2.30) 

(2.31) 


{b^  -  1)  0-1 

where  a,  b,  and  rad  are  defined  as  they  were  for  Equation  2.25  [36, 56].  When  there  are  three  classes, 
the  variance  and  partial  derivatives  of  the  second  optimal  threshold  are  estimated  with  Equation  2.27 
and  Equations  2.28  to  2.31  by  replacing  the  first  class  with  the  second  and  the  second  class  with 
third  [36,  56]. 


Under  the  framework  of  the  two-class  GYI,  the  delta  method  has  been  used  for  developing  a 
Cl  around  the  optimal  threshold  for  a  classification  system  which  utilizes  a  single  normally  or  log- 
normally  distributed  feature  (but  not  for  the  GYI  itself)  [30] .  Eor  a  Cl  around  the  optimal  threshold 
found  with  the  GYI,  the  delta  method  Cl  is  similarly  developed  as  that  for  J,  although  the  expression 
allows  for  the  cost/benefit  weighting  factor.  When  the  variances  are  equal 

Uar(^)»|i^J  Var0^)  +  |l/2  +  ^  j  Varljh)  +  |l/2  -  j  Var(jd2)  (2.32) 


This  approximation  may  be  used  in  Equation  2.26  to  construct  the  Cl  around  the  optimal 
threshold  [30].  This  Cl  has  also  been  generalized  for  when  the  variances  are  not  equal  [64].  Eurther, 
the  delta  method  has  been  used  to  derive  CIs  for  J  and  the  optimal  threshold  when  the  classification 
system  utilizes  a  single  feature  to  distinguish  between  two  classes  when  the  distribution  of  the 
feature  for  each  class  is  an  independent  gamma  [56]. 
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In  [43],  the  delta  method  CIs  for  the  two-class  J  and  the  optimal  threshold  are  modified  by 
utilizing  a  second  order  Taylor  series  expansion  as  opposed  to  the  first  order  expansion  used  in 
Equations  2.24  and  2.27.  Although  the  extension  to  the  delta  method  is  presented,  the  performance 
of  the  extended  version  is  not  compared  to  the  simpler  method  and  therefore  the  more  complicated 
derivation  has  not  been  justified.  All  della  melhod  CIs  are  only  appropriate  for  large  sample  sizes  if 
the  desired  coverage  probability  is  to  be  achieved. 

Generalized  CIs  (GCIs)  are  developed  in  [33]  for  J  and  the  optimal  threshold  under  the 
assumption  of  a  single  feature  used  for  the  classification  between  two  classes,  where  the  feature 
is  independently  and  normally  distributed  for  each  class.  These  exact  CIs  outperform  the  delta 
method  CIs  for  scenarios  considered  in  the  simulation  presented  in  [33]  because  they  meet  the 
desired  coverage  (for  small  nj  >10)  while  maintaining  a  Cl  length  that  is  less  than  the  delta  method 
Cl  length.  This  generalized  method  for  classes  with  a  normally  distributed  feature  is  also  used  for 
constructing  a  Cl  on  the  difference  in  paired  Youden  indices  in  fhe  fwo-class  framework,  allowing 
for  fhe  comparison  of  fwo  classificafion  sysfems’  performances  in  a  paired  dafa  sfrucfure  [80]. 

If  no  assumptions  are  made  abouf  fhe  disfribufion  of  fhe  feafure  used  for  classificafion,  a  non- 
paramefric  Cl  around  J  and  fhe  optimal  threshold  may  be  used.  In  [79],  a  Cl  for  the  two-class  J  is 
developed  with  the  Agrestti-Coull  confidence  inferval  for  a  binomial  proportion  (see  [2]),  where  J 
is  esfimafed  wifh 


J  = 


KXi  <  ei  +  KYj  <  01  + 


(2.33) 


A  nonparamefric  asympfolic  normal  (AN)  bootsfrap  is  utilized  fo  determine  fhe  Cl  bounds  for 
J  esfimafed  in  Equation  2.33  (fhis  mefhod  does  nof  provide  a  Cl  for  fhe  opfimal  fhreshold). 
Under  various  disfribufional  assumptions,  fhis  mefhod  approaches  fhe  desired  coverage  probabilify 
for  rij  >  50.  In  [43],  an  empirical  likelihood  melhod  which  utilizes  bootstraps  is  used  for 
constructing  a  nonparametric  Cl  around  J  and  the  optimal  threshold  in  the  two-class  framework. 
This  nonparametric  method  performs  well  with  respect  to  coverage  for  samples  of  at  least  30  in 
each  class. 

Currently,  a  confidence  inferval  around  fhe  GYI  has  nof  been  presenfed,  excepf  for  a  boolslrap 


Cl  [40]. 
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2.5.2  Confidence  on  Bayes  Cost  and  Optimal  Thresholds. 

The  two-class  J  may  be  written  as 

/  =  max  [Pill (0)  +  P2|2(0)-1]  (2.34) 

where  Pi|i(0)  and  P2|2(^)  are  the  correct  classification  probabilities  for  a  threshold(s),  0  €  0  .  The 
two-class  BC  (with  all  Ci\jPj  assumed  to  be  one,  for  i  +  j)  may  be  written 

PC-min[P2|i(0)  +  Pi|2(0)]  (2.35) 

ee& 

where  P2\ii0)  and  Pi\2{0)  are  the  misclassification  probabilities  for  a  0  €  0  .  For  greater  utility,  BC  is 
defined  wifh  prevalences  on  fhe  fwo  classes  and  differenf  cosfs  on  misclassificafion  errors  [58,  65]: 

BC  =  min  [c2\\PiP2\\{0)  +  ci|2P2^i|2(0)]  (2.36) 

where  Ci\j  is  fhe  fixed  cosf  associafed  wifh  misclassifying  class  j  as  class  i  and  pj  is  fhe  fixed 
prevalence  for  fhe  class. 

From  fhese  definifions  if  is  shown  fhaf  for  a  fwo-class  classification  sysfem,  fhe  opfimal 
fhreshold  found  by  minimizing  BC  is  equivalenf  fo  fhe  opfimal  fhreshold  found  by  maximizing 
J  (when  all  ciypj  are  equal,  for  i  t  j,  Theorem  2)  or  by  maximizing  fhe  GYI  (Theorem  3)  when  fhe 
cosfs  are  defined  as 

„GYI  _  MYI  „BC 

‘'112  ‘-2I2  ^  ‘■112 

„GYI  _  „GYI  „BC  ' 

‘'211  ‘'111  ‘'211 

where  and  are  fhe  cosfs  associafed  wifh  fhe  GYI  and  BC,  respectively.  Then,  a  Cl  around 
fhe  opfimal  fhreshold  found  by  minimizing  BC  would  be  equivalenf  fo  fhe  CIs  developed  for  fhe 
opfimal  threshold  found  with  J  or  the  GYI  (assuming  the  same  statistical  method  for  constructing 
the  Cl  is  used). 

Theorem  2.  The  optimal  threshold,  6*^^  ,  found  by  minimizing  Bayes  Cost  when  all  Ci\jPj  are 
assumed  equal  to  one,  for  i  j  ,  is  equivalent  to  the  optimal  threshold  found  by  maximizing  the 
Youden  index,  6j  =  6*^^  ,for  a  two-class  classification  system  family. 

Proof.  Let  6*^^  and  6j  represent  the  optimal  thresholds  found  by  minimizing  Bayes  Cost  and 
maximizing  the  Youden  index,  respectively.  Also,  let  Pi\j{9)  represent  the  probability  of  classifying 
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class  j  as  class  i  .  Then,  there  exits  6*^^  s  BC  =  mingg©  [/’2|i(^)  +  ^i|2(^)]  and  there  exists 
6jB  J  =  max(»g0  [Pi|i(0)  +  PiiiiO)  -  1].  Now,  consider 

=  argminoe&iPiiiiO)  +  Pi\2{0)] 

=  arg maxflg©  [1  -  P2\i{0)  -  Pi\2{0)\ 

=  argmaxflg©  [1  -  (1  -  Pi|i(6»))  -  (1  -  P2|2(6»))] 

^  argmaxflg©  [1  -  1  +  Pi|i(0)  -  1  +  P212W]  (2.38) 

^  argmax^g©  [Pi|i(6»)  +  ^212(6')  -  1] 

- 

_ k 

^  ^BC  - 

□ 

Theorem  3.  The  optimal  threshold,  9*^^  ,  found  by  minimizing  Bayes  Cost  is  equivalent  to  the 
optimal  threshold  found  by  maximizing  the  generalized  Youden  index,  =  9*^^  ,for  a  two-class 
classification  system  when  the  costs  are  defined  where  =  [^^2^^^?]  ■ 

Proof.  Let  9*^^  and  9*qyi  represent  the  optimal  thresholds  found  by  minimizing  BC  and  maximizing 

the  GYI,  respectively.  Also,  let  Ci\j  be  the  fixed  cost  associated  with  classifying  class  j  as  class  i  , 

Pj  be  the  fixed  prevalence  for  the  class,  and  Pi\j{9)  be  the  probability  of  classifying  class  j  as 

class  i  for  a  given  9  e  &  .  Assume  there  exits 

9%yi  ^  =  max^g©  sensitivity{9)  +  — ^  x  ^fy,_  ff,  x  specif icity{9)  -  1  and  there  exists 

Pp  Cjjj  c^n 
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3  BC  =  mineg©  [pic^l{P2\i{0)  +  ■  Then 


^GYI 


=  argmaxeg© 


sensitivityiO)  +  ^  x 

Pp 


^GYl_^GYI 
^2\2  ^l\2 

^GYI_^GYl 
‘'111  ‘'211 


X  specif  icity{6)  -  1 


argmaxeg0[/’i|i(0)  +  x 


argmaxeg© 

argmaxeg© 

argmaxeg© 

argmaxeg© 

argmaxeg© 


^GY!_^GYI 
‘'212  ‘'112 

^GY1_^GYI 
‘'111  ‘'211 
„GY1_„GYI 
‘'212  ‘'112 


^GYI_^GYI 
‘'111  ‘'211 


(1-P2|l(0))  +  ^X 

-Pi\m  +  f,^ 

-/’iiiW  +  ffx 


^GYl_^GYl 
‘'111  ‘'211 


X  P2|2(0)  -  G^ 

X  P2|2(0)  -  G 

X  (1  -  Pi|2(0)) 


^GYl_„GYl 
‘'212  ‘'112 


"2|1 


X  (1  -  Pl|2(0)) 


-  ^  X 
PI  ^ 


[r'^r] 

1|2 

^BC 

‘'211 

X  Pl|2(e) 

^  (-PlC^^^2|l(6>)  -  ;?2C^2^1|2(^)  +  P2C^2) 
argmaxeg©  (-Pic^^P2|i(6')  -  ^>2^^2^112(6')  +  constant 

argmaxeg©  [-;?ic|^P2|i(6')  -  P2C^2^i|2(0)] 
argmineg©  [pic^^P2|i(6')  +  P2C^fPi|2(6')] 


=  & 


(2.39) 


BC 


/3*  _  /y¥ 

^GYI  ~  ^BC 


A  delta  method  Cl  for  the  optimal  threshold  found  by  minimizing  BC  is  presented  in  [63]  for 
a  classification  system  with  two  classes  and  a  single  feature  that  is  independently  and  normally 
distributed  for  each  class.  This  Cl  is  equivalent  to  the  delta  method  Cl  for  the  optimal  threshold 
found  with  the  GYI  (Section  2.5.1)  when  costs  are  defined  wifh  Equafion  2.37. 

CIs  on  fhe  opfimal  fhresholds  found  by  minimizing  BC  in  a  mulfi-sfafe  seffing  are  derived 
using  fhe  delfa  mefhod  and  numerical  approximations  in  [65].  Nofably,  fhe  Cl  for  BC  was  nof 
derived.  However,  in  a  fhree-class  scenario,  CIs  on  fhe  fwo  fhreshold  values  may  nof  necessarily 
correspond  fo  confidence  around  fhe  opfimal  poinf.  A  specific  sef  of  fhresholds  {0\,d2]  from  fhe 
CIs  around  each  individual  fhreshold  may  be  a  hidden  exfrapolafion  oufside  fhe  opfimal  fhreshold 
region.  Therefore,  CIs  around  fhe  opfimal  fhresholds  may  nof  be  fhe  ideal  mefhod  for  quanfifying 
uncerfainfy  in  fhe  opfimal  poinf,  especially  in  a  mulfi-sfafe  seffing  wifh  more  fhan  one  fhreshold.  To 
quanfify  uncerfainfy  in  fhe  opfimal  poinf,  CIs  around  fhe  opfimal  poinf  mefric  (7  or  BC)  should  be 
considered. 
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To  further  motivate  a  Cl  around  the  optimal  point  metric  as  opposed  to  the  optimal  thresholds 
only,  consider  the  following  example.  Assume  a  random  draw  from  a  classification  system  with 
three  classes  (assume  samples  of  size  50  are  taken  from  each  class),  where  Xi  ~  N{-3, 1), 
X2  ~  N{0, 1),  and  X3  ~  N{3, 1).  The  delta  method  CIs  around  the  two  optimal  thresholds  found  to 
distinguish  between  the  classes  may  be  61  €  [-1.95,  -1.49]  and  62  €  [1.45,2.12]  using  the  method 
in  [65].  Given  the  estimated  normal  distributions  from  the  sample,  this  range  of  thresholds  would 
correspond  to  BC  values  from  0.207  to  0.226.  However,  for  the  same  sample,  the  delta  method 
Cl  around  BC  (developed  in  Section  3.2)  is  BC  €  [0.114,0.301]  and  the  true  value  of  BC  from 
the  assumed  underlying  distributions  is  0.27.  Therefore,  values  within  the  thresholds’  CIs  do  not 
necessarily  reflect  all  the  uncertainty  in  the  optimal  performance  of  the  system  (measured  by  BC), 
and  in  this  example,  overestimates  the  system’s  performance. 

Notably,  the  CIs  around  the  thresholds  are  of  use  once  a  classification  system  has  been  chosen 
for  implementation.  Before  a  classification  system  is  chosen,  however,  it  may  be  of  interest  to 
compare  competing  systems  based  on  their  optimal  performance  in  order  to  chose  the  system  with 
the  most  powerful  classification  ability.  By  constructing  a  Cl  around  each  classification  system’s  BC 
value,  performance  at  the  optimal  settings  can  be  compared  between  systems.  Currently,  methods 
for  CIs  around  BC  do  not  exist. 

2.6  Hypothesis  Tests  for  Optimal  Point  Metrics 

A  hypothesis  ”is  a  statement  about  a  population  parameter”  [12,  p.  373].  In  testing  a  hypothesis 
there  are  two  hypotheses,  the  null  hypothesis  (Hq,  0  €  ©o)  and  the  alternate  hypothesis  {H\,  0  €  0^ ). 
Both  of  these  hypotheses  make  statements  about  the  parameter  space  of  interest,  such  that  combined, 
they  cover  the  entire  parameter  space  [34,  p.60].  Of  interest  for  this  work  would  be  hypotheses  about 
metrics  of  a  classification  system,  such  as  J  or  BC.  Such  a  hypothesis  might  be  constructed  to  test 
if  a  classification  system  meets  some  desired  level  of  performance,  for  instance,  to  determine  if  a 
classification  system  performs  better  than  chance. 

There  are  two  types  of  errors  which  may  occur  when  testing  a  hypothesis.  A  Type  I  error  occurs 
if  the  null  hypothesis  is  rejected,  when  it  was  actually  true  (ie.  9  €  ©o).  A  Type  II  error  occurs  when 
the  null  hypothesis  is  not  rejected,  when  it  is  not  true  (ie.  9  e  ©^ ).  Clearly,  it  is  ideal  to  minimize  the 
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probability  of  committing  either  of  the  two  errors.  However,  there  is  a  trade-off  between  both  errors. 
Therefore,  a  level  of  significance  of  the  test  (a  e  [0, 1])  is  usually  set  such  that  the  probability  of 
a  Type  I  error  is  less  than  or  equal  to  the  level  of  significance  for  all  9  e  Qq  [34,  p.61].  Then  for 
all  tests  with  the  desired  level  of  significance,  the  test  which  minimizes  the  probability  of  a  Type  II 
error  would  be  best. 

Although  tests  of  hypotheses  on  J  or  BC  would  be  useful  when  selecting  a  classification  system, 
such  tests  have  not  been  developed. 


2.7  Distributions  for  the  Youden  Index  and  Bayes  Cost  Inference 

Making  no  distributional  assumptions  about  a  classification  system,  the  classification  outcomes 
with  respect  to  truth  can  be  modeled  as  binomial  or  multinomial  random  variables,  for  k  =  2  or  k  >  3 
classes,  respectively.  Therefore,  background  information  on  these  distributions  is  presented  in  this 
section. 


2. 7. 1  Binomial  Distribution. 

The  classification  outcomes  from  a  two-class  classification  system,  for  a  fixed  0  e  0  ,  are 
arranged  in  a  contingency  table  in  Table  2.3,  where  Xi\j  denotes  the  number  of  observations  classified 
into  class  i  with  truth  class  j.  The  sample  drawn  from  each  class  is  fixed;  consequently,  the 
knowledge  of  the  total  correct  or  incorrect  observations  explicitly  defines  the  other.  For  that  reason, 
the  correct  or  incorrect  classification  observations  from  each  class  are  modeled  as  binomial(uy,  piy), 
where  piy  is  the  true  population  probability  for  the  outcome  of  interest  and  nj  is  the  fixed  number 
sampled  from  the  class,  j  =  1,2.  The  binomial  probability  mass  function  (pmf)  is  given, 
generally,  by 


fx{x  \n,p)  =  P{X  =  x\n,p)  =  \\p  (I  -  p)  x  =  0,  0  <  p  <  1 


(2.40) 


The  maximum  likelihood  estimate  (MLE)  of  p,  is  p  =  -  . 


2. 7.1.1  Confidence  Interval  for  Binomial  Proportions. 

Clopper  and  Pearson  derived  an  exact  Cl  for  a  binomial  probability  using  fiducial  limits  in 
1934  [13].  The  Clopper-Pearson  (1  -  a)  100%  Cl  for  p  from  an  observed  statistic,  y  -  number  of 
successes,  from  a  binomial  distribution  with  fyiy  \  p)  defined  as  the  binomial  pmf,  is  found  by 
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Table  2.3:  Contingency  table  for  a  two-class  classification  system.  Column  labels  represent  truth 
and  row  labels  represent  the  label  given  by  the  classification  system. 


Class  1 

Class  2 

Tesf  =  1 

Xiii 

X112 

Test  =  2 

X211 

X212 

solving  the  following  two  equations  for  the  lower  and  upper  bound  {pL  and  pu,  respectively) 

n 

^  fyik  \pi)  =  ^  fyik  \  pl)^^  (2.41) 

k=y  k>y 

y 

^  frik  \pu)='^  frik  \  pu)  ^  ^  (2.42) 

k=0  k<y 

The  sample  space  is  y  €  (0, ...,  n).  When  y  =  0  or  y  =  n  is  observed,  a  solution  cannot  be  found  for 
one  of  the  two  above  equations  (2.41  and  2.42)  and  the  lower  bound  is  0  or  the  upper  bound  is  1, 
respectively  [3,  p.l8].  This  last  condition  is  necessary  because  these  extreme  values  of  F  result  in 
either  summation  for  any  p  to  be  1 ,  due  to  the  property  of  a  pmf  where 


2]/y(Fl0)  =  l  (2.43) 

ye}) 


for  any  6.  The  closed  form  solution  of  the  Clopper-Pearson  interval  for  a  binomial  probability  is 


1  -t 


n  -  x  +  \ 


1-1 


)^p2x,2(n—x-¥\),i—al2 


<P< 


1  -t 


(x  +  ^)p2{x+\),2(n—x),al2 


(2.44) 


where  x  is  the  observed  number  of  successes  {x  -  1, 2, . . . ,  n  -  1)  and  this  interval  has  a  coverage 
probability  of  at  least  (1  -  a)  100%  for  all  p  [2]. 

2. 7.2  Multinomial  Distribution. 

Akxk  contingency  table  is  used  for  arranging  the  outcomes  of  a  /:-class  classification  system 
for  a  fixed  6  e  &  (Table  2.4).  The  multivariate  random  variable  Xj  =  {Xi\j,X2\j,  ...,Xk\j)  represents 
the  k  outcomes  from  a  single  class  sampled  nj  times  and  is  distributed  multinomial(ny,  pj  = 
{Pi\j^P2\j,  ■■■,Pk\j))  where  piy  represents  the  true  probability  for  the  class  to  be  classified  as  the 
class,  =  Hj  ,  and  piy  -  1  .  Also,  each  observation  can  only  be  classified  as  one 
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Table  2.4:  Contingency  table  for  a  A:-class  classification  system.  Columns  represent  truth  and  rows 
represent  the  label  given  by  the  classification  system. 


Class  1 

Class  2 

Class  3 

Class  k 

Test  =  1 

Xi|i 

Xi|2 

2fl|3 

Xiik 

Test  =  2 

X211 

X212 

^2|3 

X2\k 

Test  =  3 

^3|l 

^3|2 

^3|3 

X3\k 

Test  =  k 

Xk\l 

Xk\2 

Xk\3 

Xk\k 

outcome,  resulting  in  E[xi\j  x  Xi'\j,  i  +  /']  =  0  .  The  multinomial  pmf  is 

k  , 

/x(x  I  n,  p)  =  P(Xx  =  xi,X2  =  X2, ...,  Xk  =  Xk\n,^)  =  Y\  where  Xi  e  (0, ...,  n)  (2.45) 


Each  Xi  considered  individually  (collapsing  among  the  other  i  -  1  X’s  within  class  j)  is 
distributed  binomial(n,p,)-  However,  when  considering  all  classification  outcomes  simultaneously, 
the  multinomial  distribution  is  used  as  it  allows  for  consideration  of  multiple  classification  outcomes 
at  once,  and  provides  for  the  covariance  structure  between  outcomes  within  a  class.  The  MLEs  of 
the  multinomial  parameters  are 


P  ^  {Pl,P2, 


Pk)  = 


Xi  X2 
?  ? 
n  n 


(2.46) 


where  each  Xi  is  the  i'^  observed  outcome  and  n  is  the  total  sample  size  [3,  p.21]. 

2. 7.2. 1  Confidence  Intervals  for  Multinomial  Proportions. 

In  this  section,  methods  available  for  simultaneous  CIs  for  multinomial  probabilities  and  linear 
combinations  of  multinomial  probabilities  are  presented.  In  1963,  Gold  introduced  a  Cl  for  the 
linear  combination  of  multinomial  probabilities.  Letting  I  =  {h, ...  ,lk)  denote  the  linear  multipliers 
for  each  probability: 


1 


(2.47) 
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[26][53,  p.217].  Gold  also  extended  this  for  all  linear  combinations  of  several  populations  of 
multinomial  probabilities,  pij  ,  as 


2  lijPij  ^  J]  kjPij  ±  (2-48) 

ij  ij 

where  j  denotes  the  r  populations  {j  =  I,. ..  ,r),  i  denotes  the  c  categories  (/  =  1 , . . . ,  c)  and 

^2^ 


f  k 


'^^hjPij  E  ^ijPij 


(2.49) 


!=1  \i=l 

[53,  p.  219].  When  the  linear  combinations  considered  are  contrasts,  the  degrees  of  freedom  are 
reduced  from  r{k  -  1)  to  (r  -  \){k-  1),  resulting  in  shorter  intervals  [27]. 

In  1964,  Queensberry  and  Hurst  found  the  solutions  to  the  following  quadratic  equations 


x2  2  Pi^^  -  Pi)  ■  1  , 

(Pi-Pi)  =Xk-\,a - ,l  =  l,...,k 


(2.50) 


produced  simultaneous  CIs  around  multinomial  probabilities  [51][53,  p.217]. 


Goodman  (1965)  used  Bonferroni  intervals  where 

(2.51) 


Pi  e  Pi  ±  Zajik 


a(1  -  Pi) 


[53,  p.216][71].  This  is  equivalent  to  a  Wald  Cl  with  a  Bonferroni  correction  for  multiple 
comparisons,  but  does  not  take  into  account  the  covariance  between  the  multinomial  parameters. 

Fitzpatrick  and  Scott  (1987)  also  introduced  simultaneous  CIs  for  multinomial  parameters 
in  [22]  where. 

Pi  €  Pi  ±  — —  (2.52) 

2  yn 

All  of  these  previous  methods  were  developed  with  large  sample  theory. 

Finally,  in  1995  Sison  and  Glaz  determined  that  a  simultaneous  Cl  for  multinomial  parameters 
can  be  found  by  first  estimating  the  value  of  c  where 


v(c)  =  P(xi  -  c  <  X*  <  Xi  +  c,  i  =  ■  ■  ■ ,  k)  =  I  -  a 


(2.53) 


and  X*  has  a  multinomial  distribution  with  n  and  p  =  (pi, . . .  ,pk)  [62].  Then  define 

[(1  -  a)  -  v(c)] 


y  = 


[v(c  +  1)  -  v(c)] 


(2.54) 
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and  the  following  skewed  confidence  region  is  recommended: 

[a-  --<  Pi<  'pi  +  =  1, . . . ,  kj  (2.55) 

\  n  n  I 

[62].  Determining  the  Cl  in  Equation  2.55  may  be  difficult,  however  this  method  is  coded  to 
be  implemented  in  SAS  software  [39].  The  SAS  code  was  later  adapted  into  the  MultinomialCI 
package  for  R,  which  makes  this  Cl  very  easy  to  use  [52,  69]. 

2.8  Summary 

Optimal  points  are  important  for  classification  systems,  as  they  represent  a  system’s  optimal 
performance  with  respect  to  classification  accuracy.  Metrics  for  characterizing  the  performance  of 
a  classification  system’s  optimal  point  are  developed  by  the  maximization  of  correct  classification 
probabilities  or  minimization  of  misclassification  probabilities  (i.e.  J  and  BC).  Minimization  of 
the  misclassification  probabilities  allows  for  more  flexibility  in  the  optimal  point  selection,  and 
therefore  is  chosen  as  a  focus  for  this  work. 

Little  work  has  been  done  previously  to  quantify  the  uncertainty  around  BC.  Thus,  methods 
for  quantifying  uncertainty  in  a  classification  system’s  BC  value  are  derived  and  presented  in 
the  following  chapters,  for  both  parametric  and  nonparametric  settings.  Confidence  intervals  and 
hypothesis  tests  are  developed  to  provide  a  range  of  flexible  inference  methods. 
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III.  Parametric  Confidence  Intervals 


3.1  Introduction 

The  purpose  of  this  chapter  is  to  derive  CIs  for  BC,  for  any  number  of  k  classes,  in  order 
to  quantify  the  optimal  performance  of  a  classification  system  and  compare  systems  based  upon 
performance  criteria.  These  methods  are  developed  under  the  assumption  of  a  single  feature  that  is 
independently  and  normally  distributed  for  each  class,  because  the  feature  used  for  the  classification 
is  often  assumed  to  follow  a  continuous  distribution,  most  commonly  normal  [23,  30,  33,  36, 40,  45, 
47,  49,  54-56,  58,  64,  65,  75,  79,  80].  Placing  a  parametric  assumption  on  the  feature  distributions 
allows  for  the  use  of  convenient  statistical  methods  for  the  evaluation  of  the  classification  system, 
with  accurate  results  when  the  parametric  assumptions  are  correct.  Also,  the  assumption  of  a 
normally  distributed  feature  is  useful  as  often  transformations  to  normality  are  common  place  when 
the  feature  follows  a  skewed  continuous  distribution,  such  as  gamma  or  log-normal  [45]. 

In  Section  3.2,  the  delta  method  is  used  to  approximate  the  variance  of  BC  and  the  optimal 
thresholds  for  the  development  of  their  associated  CIs.  A  numerical  estimation  technique  is  also 
presented  as  a  method  for  efficiently  estimating  the  partial  derivatives  that  are  required  for  the  delta 
method  approximations.  Numerical  estimation  is  especially  useful  (and  necessary)  when  there  are 
more  than  two  classes,  as  it  can  be  used  to  solve  equations  which  are  difficult  or  impossible  to  solve 
analytically,  while  remaining  very  accurate  [25,  p.l].  In  fact,  the  optimal  thresholds  for  BC  must 
be  found  numerically  (when  weights  on  misclassification  probabilities  are  not  equal),  and  therefore, 
their  partial  derivatives  with  respect  to  the  normal  distribution  parameters  (2k^  -  2k  of  them)  must 
also  be  solved  numerically.  Although  the  2k  partial  derivatives  of  BC  with  respect  to  the  normal 
distribution  parameters  can  be  found  analytically,  the  derivation  becomes  cumbersome  for  large  k. 
Therefore,  numerical  estimation  techniques  allow  for  easy  extension  of  the  delta  method  CIs  to  k 
classes. 

In  Section  3.3  GCIs  are  derived  for  the  k-class  J  and  BC,  again  assuming  a  single  feature  that 
is  independently  and  normally  distributed  for  each  class.  Although  CIs  for  BC  are  the  focus  of  this 
work,  the  GCI  for  the  extended  J  is  also  presented  as  it  is  not  currently  available  in  the  literature. 


31 


GCIs  for  the  optimal  thresholds  are  also  presented.  In  Section  3.4,  available  bootstrap  Cl  methods 
which  may  be  used  when  the  classification  system  is  developed  with  parametric  assumptions 
are  discussed.  Simulation  results  are  presented  in  Section  3.5.  The  simulation  examines  the 
performance  of  the  delta  and  generalized  CIs,  and  compares  these  CIs’  performance  to  that  of 
available  bootstrap  CIs.  Specifically,  coverage  probability,  coverage  symmetry,  length  of  CIs 
and  bias  of  BC  are  assessed  under  a  variety  of  classification  system  settings,  including  varying 
distributional  parameters  and  costs.  Finally,  the  results  are  summarized  in  Section  3.6. 

3.2  Delta  Method  Confidence  Intervals 

The  delta  method  uses  the  first  order  Taylor  series  expansion  to  estimate  the  variance  of 
functions  of  parameters  [12,  p.242].  A  multivariate  version  of  the  delta  method  is  given  in  the 
following  theorem. 

Theorem  4  (Multivariate  Delta  Method). 

Suppose  that  6  is  Asymptotic-Normalk{6,  with  b„  0  and  that  g  is  a  real-valued 
function  with  partial  derivatives  existing  in  a  neighborhood  of  0  and  continuous  at  6 
with  g'{6)  =  dg{6)ld6  not  identically  zero.  Then  as  n  ^  oo 

g{6)  is  Asymptotic -Normal{g{9),  b^g  {OyLgffff] 

[8,  p.  238] 

Often  bn  is  taken  to  be  ^  [8,  p.  238].  In  Theorem  4,  6  is  used  to  represent  any  vector  of  statistical 
parameters.  This  theorem  is  applied  for  BC  and  the  optimal  threshold  values,  0*,  ,  which  are  both 
functions  of  (p,  cr^). 

3.2.1  Bayes  Cost  and  Optimal  Thresholds,  3  classes. 

Recall,  if  the  classification  system  is  developed  using  a  single  feature  for  the  classification 
of  three  classes,  where  the  feature  is  independently  and  normally  distributed  for  each  class  with 
Pi  <  P2  <  P3  and  with  two  threshold  values  used  to  distinguish  between  the  classes  (denoted 
9i  <  02)^  BC  can  be  expressed  using  the  standard  normal  CDF  and  the  optimal  threshold  values 
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which  minimize  BC  as: 


BC3  ^  C2\lPl  X  O 


ex -Pi 


o-\ 

+  C112P2  X  <1) 
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e\-P2 


e\-p\ 


cri 

+  c\\^P2  X  O 


o-i 

+  C312P2  X  (<1) 

e\-p^ 


+  C3|lPl  X 

P2-ex 


Pi  -  e*2 
o-i 


0-2 


0-3 


+  C2\3P3  X  P 


9*2 -P3 

0-3 


d) 


-  P3 

0-3 


(2.21) 


Note  that  the  minimization  is  not  expressed  in  Equation  2.21  as  it  uses  the  optimal  thresholds 
(6j  <  0p.  The  optimal  thresholds  must  be  estimated  numerically  when  all  Ci\jPj  are  unequal,  for 
i  j  .  For  BC  defined  in  Equation  2.21,  BC  =  g(x,  S^) .  Since  (x,  S^)  are  asymptotically  multivariate 
normal,  the  multivariate  delta  method  may  be  applied  (see  Appendix  A.  1  for  asymptotic  properties 
of  Therefore,  by  Theorem  4,  BC  is  Asymptotic-Normal  [BC,  Var(BC)]  and  the  variance  of 

BC  from  Equation  2.21  is  estimated  according  to  the  delta  method  using  the  following  equation: 


\2 


Varipi)  + 


{dBC3Y 

I  dp2  I 


Var(p2)  + 


Var(cri)  + 


1^] 
I  dp3  I 

(SBCsf 

\  do-2  / 


Var(p3) 


Var{o-3)  (3.1) 


where  all  covariances  are  zero  due  to  the  assumption  of  independence  between  the  feature’s 
distributions  for  each  class.  Eetting  jTj  =  Xj  and  -  S  j  ,  VarQuj)  and  Var{o^j)  are  [56] 


CT; 


Varipj)  =  — 


and 


Var^cTj)  = 


2{nj  -  1) 


(3.2) 


(3.3) 


Thus,  to  estimate  Equation  3.1,  the  partial  derivative  of  BC3  with  respect  to  the  normal  distribution 
parameters,  yj  (where  yj  =  pj  or  crj  and  j  =  1,2, 3),  are  defined 


dBC3 


dyj 

where  for  yj  -  pj 
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or  for  yj  =  crj 


and  for  both  fij  and  crj 


5o-j  = 


Bim- 


forj-1 
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A  more  detailed  derivation  of  these  results  is  presented  in  the  Appendix,  Section  A.2.  The  six 
partial  derivatives  of  BC  with  respect  to  pj  and  crj  in  Equation  3.4  are  estimated  using  the  numerical 
estimates  for  ^  and  ^  (described  in  Section  3.2.3,  m  =  1,2,  j  =  1,2, 3)  as  well  as  jij  =  Xj 
and  cTy  =  S  j  .  Using  these  estimated  partial  derivatives,  the  variance  of  BC3  is  estimated.  The 
(1  -  O')  100%  delta  method  Cl  for  the  three-class  BC  is 


BC3  ±  Zq  ylvariBCs) 


(3.5) 


Confidence  intervals  around  the  optimal  thresholds  in  addition  to  the  Cl  for  BC  are  also 
of  interest.  There  are  three  solutions  for  determining  the  optimal  thresholds  in  this  parametric 
framework.  First,  when  all  Ci\jPj  are  equal,  for  it  j  ,  the  optimal  thresholds  may  be  found 
equivalently  as  with  J  (see  Section  2.5.2).  Therefore,  the  solution  for  the  optimal  thresholds  is 


Pm{b^  -  -  a  +  b  +  2{b^  -  l)cr2  \n{b) 

ib^  -  1) 


(3.6) 
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where  a  =  Um+i  -  Um  and  b  =  ,  m  -  I, . . .  ,k  -  \  [56].  Second,  if  cr^  =  cr^+i  the  optimal  point 

is  the  midpoint  between  the  means  [56]: 

_  bm  +  bm+l  ,,, 


Finally,  when  all  Ci\jPj  are  not  equal,  for  i  +  j  ,  the  optimal  thresholds  must  be  estimated  using 
numerical  minimization  (see  Section  3.2.3).  Whether  0*,  is  found  using  Equation  3.6,  Equation  3.7, 
or  numerically,  the  optimal  thresholds’  estimates  are  functions  of  the  sample  mean  and  variance 
(^m  =  /(^’  '^^))-  Theorem  4,  0*,  is  Asymptotic-Normal[0*„,  Var(6*„)]  ,  and  the  delta  method 
approximate  variance  for  each  of  the  two  optimal  thresholds  is  given  by 

This  estimate  provides  a  (1  -  a)  100%  delta  method  Cl  for  each  optimal  threshold  of  ± 
Z|  -yjvarie*,)  ,  as  was  demonstrated  in  [65]. 

3.2.2  Bayes  Cost  and  Optimal  Thresholds,  k  classes. 

These  methods  extend  easily  for  A:  >  3  classes.  When  there  are  k  classes,  BC  may  be  expressed 
using  the  normal  CDE  as 

j=2  ^  t  / 
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The  (1  -  a)  100%  Cl  for  BC  is  still  BC  ±  zz  ^Var{BC)  where 
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(3.10) 


and  the  partial  derivatives  may  be  estimated  using  Equation  3.4  for  three  classes.  Equations  A.4 
through  A.l  1  in  Appendix  A.3  for  four  classes,  or  the  methods  described  in  Section  3.2.3  below  for 
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any  k  classes.  Similarly,  the  (1  -  a)  100%  Cl  for  each  optimal  threshold  is  9*„ 


,/=i 


del 


dpj 


I  del 


^  Var{ird  +  \^\  Var{a,) 


\d(T^ 


Varieii)  where 
(3.11) 


35 


When  all  Ci\jPj  are  equal  (WLOG  assumed  to  be  one),  for  it  j  ,  the  partial  derivatives  are  given 
in  Equations  2.28  through  2.31  with  class  1  and  class  2  being  replaced  with  class  j  -  m  and  class 
7  =  m  +  1  for  the  optimal  threshold  {m  -  .  ,k  -  \).  When  the  costs  and  prevalences  are  not 

equal,  the  optimal  thresholds  must  be  found  numerically  and  the  partial  derivatives  are  estimated 
using  Equation  3.13.  Note  that  these  CIs  for  k  classes  define  the  CIs  for  ^  =  2  and  3  classes  as  well. 

Einally,  it  is  worth  noting  that  there  exists  covariance  between  each  m  and  m  +  \  threshold,  due 
to  the  thresholds’  shared  dependence  on  the  feature’s  parameters  of  the  class  between  them  and  may 
be  estimated  with  the  delta  method  as 
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[36].  This  covariance  may  be  used  for  constructing  confidence  regions  around  pairs  of  opfimal 
fhresholds. 


3.2.3  A  Method  for  Numerically  Estimating  Partial  Derivatives. 

Alfhough  fhe  solufions  fo  fhe  opfimal  fhresholds,  0]^  ,  are  functions  of  fhe  disfribufional 
paramefers,  fhey  generally  musf  be  found  numerically  when  minimizing  BC.  Therefore,  fhe  parfial 
derivatives,  ^  and  (7  -  1,  ■■■,  k  represenfs  fhe  frue  class  and  m  -  ...,  k  -  I  denotes  fhe  opfimal 

fhresholds),  musf  also  be  esfimafed  numerically.  This  can  be  accomplished  using  fhe  fwo-poinf 
cenfral  difference  mefhod  [25,  p.  254].  Applying  fhis  mefhod,  for  yj  =  pj  or  <Tj 

dOln  ^  ^m(7j  +  g)  -  ^m(7j  "  g)  ^3  ^3^ 

djj  2e 


leaving  all  ofher  normal  paramefers  consfanf  for  each  calculation.  The  ferm  d*„fyj  ±  s)  is  defermined 
using  fhe  same  numerical  minimizafion  mefhod  as  fhaf  used  fo  find  fhe  opfimal  fhreshold  values. 
The  fruncafion  error  for  fhis  difference  mefhod  is  O(e^)  .  The  s  value  should  be  chosen  fo  minimize 
fhe  error  of  fhe  approximation,  which  for  double  precision  (using  64  bifs  fo  sfore  values)  would  be 

IQ-ifi 

Error  «  - +  O(e^)  (3.14) 

s 

This  error  would  be  minimized  for  e  on  fhe  order  of  10“^®^^.  Therefore,  a  small  e  should  be  chosen; 
however,  s  should  be  >  10“^  fo  avoid  inflafing  fhe  error  caused  by  computer  precision. 
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The  partial  derivatives  of  BC  can  be  found  analytically,  and  were  presented  in  Section  3.2.1 

for  three  classes  and  in  the  Appendix,  Section  A.3  for  four  classes.  For  any  k  classes,  the  partial 

derivatives  for  BC  can  be  approximated  by 

dBC  BCiyj  +  £)  -  BCiyj  -  s) 

~ ^ - - -  (3-15) 

oyj  2e 

where  BC{yj  +  e)  is  found  using  Equation  3.9  for  yj  =  fij  or  crj  ,  and  using  equivalent  values  for  s 
as  discussed  for  6*^  . 


3.3  Generalized  Confidence  Intervals 

In  [33],  GCIs^  are  developed  for  the  two-class  J  as  an  exact  method  for  constructing  CIs  around 
J  and  the  optimal  threshold  when  the  feature  used  for  classification  is  independently  and  normally 
distributed  for  each  class.  Define  (  -  {9, 6)  where  9  is  fhe  paramefer  of  inferesf  and  5  is  a  vecfor  of 
nuisance  paramefers. 

Definition  1  (Generalized  Pivotal  Quantity). 

Let  R  -  r(X;  x,  be  a  function  of'K.  and  possibly  x,  f  as  well.  The  random  quantity  R 
is  said  to  be  a  generalized  pivotal  quantity  if  it  has  the  following  two  properties: 

Property  A:  R  has  a  probability  distribution  that  is  free  of  unknown  parameters. 

Property  B:  robs  defined  as  robs  -  ?'(x;  x,  ...  does  not  depend  on  nuisance 
parameters,  6.  [73,  p.  146] 

Definition  2  (Generalized  Confidence  Interval). 

If  the  subset  Cy  of  the  sample  space  p  of  R  satisfies  (Pr{R  e  Cy)  -  y),  then  the  subset 
©c  of  the  parameter  space  given  by  0c(G  =  [0  e  0  |  r(x;x,  ^)  e  Cy}  is  said  to  be  a 
1007%  GCIfor9.  [73,  p.  146[ 

3.3.1  Youden  Index,  k  Classes. 

In  [33],  a  GCI  is  developed  for  the  two-class  J  by  constructing  generalized  pivotal  quantities 
(GPQs)  for  pj  and  crj  (j  =  1,2),  and  then  using  these  pivotal  quantities  to  construct  GPQs  for  the 
optimal  threshold  and  J.  For  a  classification  system  with  k  classes  and  a  normally  distributed  feature, 
there  will  be  A:  -  1  optimal  thresholds,  one  between  each  pair  of  normal  distributions.  Therefore,  in 
order  to  extend  this  method  for  k  classes,  k  -  I  GPQs  for  the  optimal  thresholds  must  be  determined 
and  used  to  define  the  GPQ  for  J  (defined  as  the  sum  of  all  correct  classification  rates).  Each  optimal 

^In  [66]  it  was  noted  that  the  implementation  of  these  GCIs  is  identical  to  constructing  the  CIs  via  Bayesian  Inference 
using  the  non-informative  prior  (pQ,  cr^)  ;^). 
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threshold  value  is  determined  explicitly  by  the  distributions  of  the  two  classes  it  divides  [46].  For 
this  problem,  J  is  the  parameter  of  interest  and  the  mean  (jij)  and  variance  (cr^)  from  each  class  are 
the  nuisance  parameters.  Then  (as  is  done  in  [33]  for  two  classes)  define 


(3.16) 

(3.17) 


where 

^  ~ 

Sjl^ 


(3.18) 


and 

Vj  = - ^  (3.19) 

The  sample  mean  (xj)  and  standard  deviation  {S  j)  are  from  the  class,  tj  ~  ,  a  t-distribution 

random  variable  with  nj  -  1  degrees  of  freedom,  and  Vj  ~  ^  chi-square  random  variable  with 

rij  -  1  degrees  of  freedom  [12,  p.  218,  223].  To  find  fhe  k-\  GPQs  for  fhe  opfimal  fhresholds  , 
indexed  onm  =  1,2,...,^:  -  1),  firsf  define  fhe  following  k  -  I  GPQs 


Ra  =  Ru  ~  Ru 
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^ni 
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Nexf,  fhe  GPQs  for  fhe  k  -  I  opfimal  fhresholds  are  compufed  as 
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for  m  =  1, 2, . . . ,  A:  -  1.  Using  fhese  GPQs,  fhe  GPQ  for  fhe  Ac-class  J  is  defined  as 
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(3.20) 

(3.21) 


(3.22) 


(3.23) 


If  is  clear  fhaf  7?^^  and  Ra-j  do  nof  depend  on  any  unknown  paramefers  and  fherefore,  Rg*^  and  R j 
(defined  only  wifh  R^^-  and  Ra-j)  do  nof  depend  on  unknown  parameters.  This  salisfies  properly  A 
of  Definition  1.  Also  note,  -  Rj(x,S)  is  evaluated  by  using  xj  and  S j  in  Equations  3.18  and 
3.19  and  fhen  subsfifuling  Equations  3.18  and  3.19  info  Equations  3.16  and  3.17,  respectively.  This 
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results  in  S)  =  yUy  ,  /?cr^(x,  S)  =  (Tj  ,  and  Rg*^{x,S)  -  9*„  .  Evaluating  Rj  with  these  values 
gives  /?7(x,  S)  =  J  ;  therefore  does  not  depend  on  any  nuisance  parameters  and  property  B  of 
Definition  1  is  met. 

Finally,  a  Cl  around  J  can  be  found  using  Monte  Carlo  simulation  by  generating  a  large  number 
(Kw  2,500)  of  random  draws  from  tj  and  Vj  for  each  class,  j  =  I, ..  .,k.  Using  these  values  in 
Equations  3.16  through  3.23,  K  Rj  values  are  calculated.  Then  the  (|)100'^  and  (1  -  |)100'^* 
percentiles  of  7?^  are  defined  fo  be  fhe  lower  and  upper  bounds  for  fhe  (1  -  a)  100%  GCI  around  J, 
respectively  [33].  Also  note,  fhe  (1  -  a)100%  GCI  around  each  of  fhe  k  -  \  optimal  fhresholds  can 
be  found  similarly  using  fhe  appropriafe  percenfiles  of  each  Rg*^  GPQ  (m  =  1, . . . ,  ^  -  1). 

3.3.2  Bayes  Cost,  Equal  Weights. 

The  GCI  around  BC  from  a  classification  system  wifh  equal  Ci\jPj,  for  i  +  j  (WEOG  PjCi\j  =  1  , 
accomplished  by  scaling  BC  by  fhe  reciprocal  of  fhe  common  multiplier),  are  found  using  fhe  GPQs 
for  fhe  mean,  sfandard  deviafion,  and  k  -  \  optimal  fhresholds  in  Equations  3.16  -  3.22.  The  mean 
and  variance  of  fhe  feafure’s  disfribufion  for  each  class  are  still  fhe  nuisance  paramefers,  and  BC  is 
fhe  paramefer  of  inferesf.  Then  fhe  GPQ  for  BC  is 


If  is  clear,  as  was  discussed  for  Rjin  fhe  previous  secfion,  fhaf  is  a  GPQ  meefing  bofh  properties 
of  Definition  1.  The  (1  -  a)  100%  GCIs  around  BC  and  fhe  optimal  fhresholds  may  be  found  using 
Monte  Carlo  simulation  as  was  described  for  J  and  fhe  optimal  fhresholds  in  Secfion  3.3.1. 

3.3.3  Bayes  Cost,  Unequal  Weights. 

In  Ibis  section,  fhe  GCI  for  BC  from  a  classification  system  wifh  unequal  Ci\jPj  ,  for  i  +  j  , 
is  developed.  Once  again,  fhe  nuisance  paramefers  are  fhe  mean  and  variance  of  fhe  feafure’s 
disfribufions  for  each  class,  and  fhe  paramefer  of  inferesf  is  BC.  Wifh  unequal  cosfs,  fhe  GPQs 
for  fhe  opfimal  fhresholds  can  no  longer  be  found  using  fhe  closed  form  solution  in  Equation  3.22. 
Allhough  fhere  is  no  closed  form  solufion  for  Rg*^  ,  fhe  opfimal  fhresholds  are  functions  of  fhe  mean 
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and  variance  of  each  class  and  can  be  found  with  numerical  minimization.  The  GPQ  for  BC  is 
defined,  now  with  costs  and  prevalences  on  the  misclassification  probabilities: 


Rbc  =  Yj^\\ipM  — 
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(3.25) 


The  k  -  \  optimal  threshold  values’  GPQs  are  found  numerically  for  each  of  the  K  sets  of 
Rf^j  and  Ra-j  values  from  Equations  3.16  through  3.19  (this  requires  K  numerical  minimizations  of 
Equation  3.25,  resulting  in  K  Rg*^  values  and  K  Rbc  values).  Once  again,  Rg*^  =  /(R^,  Ro-)  and  each 
Rg*^  does  not  depend  on  any  unknown  parameters.  Therefore,  as  was  seen  for  J  and  BC  with  equal 
weights  in  Sections  3.3.1  and  3.3.2,  Rbc  in  Equation  3.25  does  not  depend  on  unknown  parameters 
and  achieves  property  A  of  Definition  1 .  Also,  Rfjj{x,  S)  =  i^j  ,  Ra-j{x,  S)  =  crj  ,  and  Rg;„ix,  S)  =  9^  , 
resulting  in  rBCobs  -  Rbc{x,  S)  =  BC  ,  which  does  not  depend  on  nuisance  parameters.  This  satisfies 
property  B  of  Definition  1 .  Once  again,  by  randomly  generating  K  values  of  tj  and  Vj  for  each 
class,  K  Rbc  and  Rg*^  values  are  determined  with  numerical  minimization.  Then,  the  (1  -  a)  100% 
GCI  around  BC  is  determined  as  the  (|)100'^  and  (1  -  |)100^^  percentiles  of  Rbc  (or  similarly,  the 
analogous  percentiles  of  Rg*^  are  used  to  construct  GCIs  around  the  optimal  thresholds). 


3.4  Bootstrap  Methods 

Bootstrap  methods  were  introduced  by  Efron  in  the  1970s  [12,  p.  478].  The  bootstrap  can  be 
used  for  creating  CIs  for  large  or  small  data  samples  where  the  assumptions  inherent  for  other 
methods  are  not  met.  With  increasing  computing  power,  the  bootstrap  has  become  a  popular 
method  for  constructing  CIs.  Typically,  a  nonparametric  bootstrap  sample  X*  =  (A*, . . .  ,X*)  is 
created  from  a  random  sample  X  =  (Ai, . . . ,  A„)  where  a  new  sample  of  size  n  is  drawn  from  X 
with  replacement.  It  is  also  possible  to  draw  a  parametric  bootstrap  sample,  where  an  underlying 
distribution  (Fxix  \  9))  is  assumed  known  and  where  9  {or  6)  is  an  estimate  for  the  true  parameter 
9  (or  parameters,  9)  from  the  initial  sample  X  [11].  Then,  the  bootstrap  sample  X*  is  created 
by  sampling  n  times  from  the  distribution  {Fx{x  \  9)).  The  work  in  this  dissertation  utilizes 
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nonparametric  resampling,  which  is  the  sampling  procedure  most  commonly  used.  Generally,  a 
large  number  (B)  of  bootstrap  samples  (X*)  are  drawn  in  order  to  construct  CIs. 

One  common  bootstrap  Cl  assumes  asymptotic  normality  of  the  parameter  estimate.  This  is 
accomplished  by  estimating  the  variance  of  the  parameter  estimate  from  the  B  bootstrap  samples  and 
using  this  variance  with  standard  normal  quantiles  to  construct  a  Cl.  This  method  generates  what 
is  known  as  an  asymptotic  normal  (AN)  bootstrap  Cl  [14].  This  method,  however,  is  not  robust 
under  transformations  of  the  parameters,  and  could  also  possibly  include  values  in  the  interval  that 
are  not  valid  (for  example,  BC  values  less  than  zero)  [11,  14].  Therefore,  two  other  bootstrap  Cl 
methods  are  considered  which  are  the  basic  percentile  (BP)  bootstrap  Cl  and  the  bias  corrected 
and  accelerated  (BCa)  bootstrap  Cl.  The  advantage  of  the  BP  Cl  is  that  the  resulting  interval 
will  not  include  invalid  values  of  the  parameter  of  interest,  since  the  Cl  bounds  are  found  as  the 
appropriate  percentiles  from  the  B  bootstrap  estimates  of  the  parameter.  However,  a  disadvantage 
to  this  method  is  that  the  coverage  will  be  low  when  the  distribution  of  the  estimated  parameter 
is  not  symmetric  [11].  The  BCa  Cl  has  the  same  advantage  of  the  BP  Cl,  however  also  performs 
well  for  skewed  distributions  of  the  estimated  parameter  [11].  All  three  of  these  bootstrap  CIs  are 
implemented  using  the  boot.ci  function  in  the  boot  package  in  R  [10,  15,  52].  For  more  information 
on  the  bootstrap  see  [15]. 

The  performance  of  the  bootstrap  CIs  may  be  impacted  by  the  method  used  for  estimation  of  the 
parameter  of  interest,  as  different  estimation  techniques  result  in  different  levels  of  bias  depending 
on  the  true  scenario  (here,  classification  system  structure  as  well  as  feature  distributions).  For 
comparison  to  the  parametric  CIs  for  BC  presented  in  this  chapter,  the  point  estimates  for  BC  and 
9*  are  estimated  parametrically  as  is  done,  for  example,  in  Equations  2.21  and  3.6,  respectively. 

3.5  Simulation  Results 

A  simulation  study  was  conducted  to  demonstrate  the  performance  of  the  delta  method  and 
generalized  CIs  around  BC,  and  compare  their  performance  to  available  bootstrap  Cl  methods.  The 
performance  of  CIs  around  the  optimal  thresholds  is  also  evaluated.  Various  classification  scenarios 
are  considered  including  different  sample  sizes,  underlying  distributions  of  the  feature  used  for 
classification,  differing  cosfs  associafed  wifh  fhe  misclassificafions,  and  classificafion  accuracy 
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(measured  by  the  BC  value).  All  scenarios  assume  a  classifier  with  three  classes  and  two  optimal 
thresholds  (0j  <  6*^  to  distinguish  between  adjacent  classes.  Thus  BC3  could  range  from  completely 
accurate,  BC3  =  0.0,  to  misclassifying  all  observations,  BC3  =  3.0.  Five  BC3  values  are  chosen 
to  demonstrate  a  range  of  classification  system  performances  (all  better  than  chance  classification 
which  occurs  for  BC3  =  1.5).  These  values  are  BC3  =  0.27,  0.42,  0.63,  0.91,  and  1.23.  The 
distributional  parameters  for  each  class  are  determined  by  varying  each  distribution’s  mean  and 
variance  in  order  to  achieve  the  desired  BC3  value.  The  parameters  for  all  scenarios  are  presented 
in  Table  3.1. 

In  Section  3.5.1,  it  is  assumed  that  all  Ci\jPj  are  equal,  for  i  j  .  Using  this  equal 
cost/prevalence  structure,  various  distributions  on  the  feature  are  considered  in  order  to  study  the 
impact  of  non-normal  distributions  on  the  performance  of  the  Cl  methods.  Therefore,  the  CIs  are 
applied  as  described  in  this  chapter,  using  the  methods  derived  for  normally  distributed  features.  In 
Section  3.5.2,  two  additional  cost  structures  are  used  to  determine  if  unequal  cost  scenarios  alter  the 
CIs’  performance.  These  different  costs  are  applied  to  the  same  normal  distribution  settings  in  Table 
3.1  (cr3  -  1),  however  the  resulting  BC3  values  change  due  to  the  multiplication  of  the  different 
costs  on  the  misclassification  probabilities.  Although  the  normal  distributions  are  unchanged,  the 
different  cost  structures  also  result  in  different  optimal  thresholds  between  the  classes,  as  is  expected 
when  accounting  for  the  costs  placed  on  the  different  classification  errors. 

The  bootstrap  Cl  methods  considered  for  comparison  are  the  BP,  AN,  and  BCa.  All  bootstrap 
CIs  utilize  1,000  nonparametric  resamples  and  estimate  BC3  parametrically  (Equation  2.21).  The 
optimal  thresholds  (0*  <  are  found  with  Equation  3.6  or  via  numerical  minimization  for  each 
resample,  for  equal  and  unequal  costs  respectively.  Equation  2.21  is  also  used  to  estimate  BCj,  for 
the  delta  method  CIs.  A  similar  parametric  formulation  is  used  for  the  GCIs,  eliminating  the  impact 
of  bias  on  comparisons  of  coverage  probability  between  the  different  Cl  methods.  Random  samples 
from  each  of  the  three  classes  are  generated  of  sizes  nj  =10  to  250  from  the  appropriate  distributions 
for  all  scenarios.  This  is  repeated  5000  times  (3000  times  for  the  GCIs  due  to  computational  time) 
to  determine  the  coverage  probability,  left  and  right  coverage  probability  (for  Cl  symmetry),  and 
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Table  3.1:  Distributional  parameters  for  the  parametric  Cl  simulation. 


Distribution 

BCa 

Class  1 

Class  2 

Class  3 

Normal 

cr 

cr 

B 

cr 

(o-3  ^  1) 

1.23 

-1 

1 

0 

1 

1 

1 

0.91 

-1.5 

1 

0 

1 

1.5 

1 

0.63 

-2 

1 

0 

1 

2 

1 

0.42 

-2.5 

1 

0 

1 

2.5 

1 

0.27 

-3 

1 

0 

1 

3 

1 

Normal 

cr 

B 

cr 

B 

cr 

(ca  ^  2) 

1.23 

-1 

1 

0 

1 

1.2 

2 

0.91 

-1.5 

1 

0 

1 

2 

2 

0.63 

-2 

1 

0 

1 

2.85 

2 

0.42 

-2.5 

1 

0 

1 

3.6 

2 

0.27 

-3 

1 

0 

1 

4.4 

2 

Normal 

cr 

B 

cr 

B 

cr 

(era  ^  4) 

1.23 

-1 

1 

0 

1 

1 

4 

0.91 

-1.5 

1 

0 

1 

2.6 

4 

0.63 

-2 

1 

0 

1 

4.2 

4 

0.42 

-2.5 

1 

0 

1 

5.5 

4 

0.27 

-3 

1 

0 

1 

6.9 

4 

Gamma 

a 

P 

a 

P 

a 

P 

1.23 

1.3 

1 

2 

1.5 

3 

1.738 

0.91 

1.3 

1 

2 

1.5 

3 

3.544 

0.63 

1.3 

1 

2 

1.5 

5 

5.340 

0.42 

1.3 

1 

2.3 

3.7 

5 

6.463 

0.27 

1.3 

1 

2.3 

3.7 

5 

13.696 

Normal 

jN(p,  cr) 

^Nip,  cr) 

B 

cr 

\N(p,  cr) 

\N{p,  cr) 

Mixtures 

1.23 

2A(-.2988,  1) 

0 

1 

iA(.800, 1) 

iA(3.600, 1) 

0.91 

\N{-2.235,  1) 

iN(-l,2) 

0 

1 

iA(.800, 1) 

iA(3.600, 1) 

0.63 

\N{-4.5, 1) 

\N{-2,2) 

0 

1 

^A(1.200, 1) 

iA(3.600, 1) 

0.42 

^N(-4.5, 1) 

\N{-2,2) 

0 

1 

iA(2.417, 1) 

iA(4.817, 1) 

0.27 

2A(-4.5,  1) 

\N{-2,2) 

0 

1 

iA(5.210, 1) 

^A(7.610, 1) 

the  average  Cl  length.  Absolute  bias  of  the  point  estimates  is  also  determined  and  is  discussed 
throughout  the  following  sections. 

All  simulations  are  run  in  R  utilizing  the  boot  package,  and  numerical  minimization  of  BC  is 
performed  using  the  optim  function  with  method  ’’L-BFGS-B”  [10,  15,  52].  The  partial  derivatives 
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of  the  optimal  thresholds  with  respect  to  the  normal  distribution  parameters  are  found  numerically 
as  described  in  Section  3.2.3  with  the  same  optim  function.  The  partial  derivatives  of  BC  with 
respect  to  the  normal  distribution  parameters  are  calculated  with  Equation  3.4.  Due  to  the  large 
number  of  numeric  results  for  this  simulation,  the  tables  of  results  are  in  the  Appendix,  Section  B.l. 
A  summary  of  these  results  follow. 

3.5.1  Equal  Costs  and  Prevalences. 

All  costs  and  prevalences  are  assumed  equal,  with  a  multiplier  on  each  misclassification 
probability  of  one  (i.e.,  Ci^j\j  =  3  ,  pj  -  |).  Four  different  feature  distributions  are  simulated 
(normal,  gamma,  gamma  transformed  to  normal  (via  Box-Cox),  and  normal  mixtures).  In  addition, 
three  normal  distribution  scenarios  are  considered,  one  with  all  cr,  =  1  ,  one  with  cri  =  0-2  =  1  and 
0-3=2,  and  one  with  tri  =  0-2  =  1  and  0-3=4. 

3.5.1. 1  Performance  of  Confidence  Intervals  around  Bayes  Cost. 

The  coverage  probability  and  length  for  the  delta,  generalized,  and  bootstrap  CIs  when  all 
Ci\jPj  are  assumed  equal,  for  it  j ,  are  presented  in  Table  B.l  for  a  feature  with  independent  normal 
distributions  for  each  class  and  in  Table  B.2  for  when  the  feature  is  not  distributed  normal.  In 
general,  the  delta  method,  generalized,  and  bootstrapped  BCa  CIs  perform  similarly  and  better  than 
the  other  two  bootstrap  CIs  for  BC3.  When  the  feature  is  normally  distributed  (equal  or  unequal 
variances),  the  length  of  all  intervals  are  similar  for  nj  >  50  and  the  length  of  the  delta  method  and 
generalized  CIs  are  slightly  larger  than  the  bootstrap  CIs  for  nj  =  10.  However,  the  delta  method  Cl 
performs  slightly  better  than  the  BCa  Cl  when  considering  coverage  for  nj  =  10  and  the  generalized 
Cl  performs  the  best  with  regards  to  coverage  for  nj  =  10  (only  method  to  achieve  coverage  of  at 
least  95%).  For  nj  >  50  both  the  delta  method  and  BCa  CIs  have  similar,  good  coverage  (w  93  - 
95%).  The  GCI  has  better  coverage  than  the  delta  method  and  BCa  bootstrapped  CIs  for  all  sample 
sizes  (w  95  -  96%),  with  comparable  lengths.  Changing  the  value  of  0-3  does  not  have  a  significant 
impact  on  the  coverage  for  any  of  the  methods. 

The  symmetry  of  the  CIs  for  the  normally  distributed  features  are  presented  in  Figure  3. 1  for  the 
delta  method  Cl  and  Figure  3.2  for  the  GCI,  with  0-3  =  1  and  era  =  4  in  rows  1  and  2,  respectively. 
The  delta  method  CIs  around  BC3  for  both  scenarios  are  skewed  left,  with  the  skew  becoming  less 
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extreme  as  nj  increases.  The  GCIs  demonstrate  an  opposite  trend  in  skewness,  although  notably 
much  less  extreme  than  that  of  the  delta  method  (Right  -  Left  coverage  €  [-.04,0.04]  compared  to 
Right  -  Left  coverage  €  [-10,0]  for  the  delta  method).  The  RC3  bias  across  all  scenarios  is  low 
for  the  normally  distributed  features,  as  expected  (absolute  bias  e  [.00003,  .05]).  In  general,  the 
absolute  bias  decreases  as  nj  increases  and  increases  when  the  BC3  value  increases  (less  accurate 
classification). 

When  the  feature  used  for  classification  is  distributed  with  an  independent  gamma  for  each 
class  and  is  not  transformed  to  normality,  coverage  probability  for  all  Cl  methods  is  greatly 
diminished  (see  Table  B.2).  For  all  sample  sizes  in  this  scenario,  the  delta  method  and  generalized 
CIs  perform  better  than  the  BCa  Cl  for  accurate  tests  {BC^=0.n  and  0.42),  worse  than  BCa  for 
very  inaccurate  tests  {BC^=1.23),  and  similar  to  the  BCa  Cl  for  the  other  two  scenarios.  The  one 
exception  is  for  nj  =10,  where  the  GCI  method  performs  better  than  the  delta  and  BCa  CIs.  The  bias 
of  the  estimates  for  the  gamma  distributed  feature  is  slightly  worse  than  with  the  normal  distributed 
feature  (absolute  bias  €  [.001,  .09])  and  follow  the  same  trend  as  the  normal  feature  with  respect  to 
Uj  and  BCi,  values. 

When  the  feature  is  distributed  gamma  and  transformed  to  normality,  the  coverage  probability 
is  improved  (Table  B.2).  However,  overall,  the  coverage  is  slightly  worse  than  when  the  feature 
is  distributed  normal,  especially  for  the  accurate  scenarios  (RC3  =  0.27).  The  GCI  has  a  slight 
advantage  in  coverage  for  this  distributional  scenario,  although  this  results  in  longer  intervals  than 
the  delta  and  BCa  CIs.  The  bias  of  the  estimates  for  RC3  is  very  similar  to  that  from  normally 
distributed  features  (absolute  bias  €  [.00002,  .06])  and  again  has  similar  trends  with  nj  and  BCj. 
Finally,  when  the  feature  is  distributed  as  independent  normal  mixtures  for  each  class,  the  coverage 
probability  for  all  methods  is  sporadic  and  poor,  with  the  BCa  CIs  performing  slightly  better  than 
the  other  methods  (Table  B.2).  The  bias  of  RC3  for  these  distributions  also  represents  the  worst 
of  all  scenarios  considered  (absolute  bias  €  [.001,  .13]),  with  only  slight  improvements  in  bias  for 
increases  in  nj  and  decreases  in  BC3  value. 
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3. 5. 1.2  Performance  of  Confidence  Intervals  around  Optimal  Thresholds. 

The  coverage  probability  and  length  of  the  delta,  generalized,  and  all  bootstrap  CIs  when  all 
costs  are  assumed  equal  with  a  normally  distributed  feature  are  presented  in  Table  B.3  for  0*  and 
Table  B.4  for  0^  ■  Both  the  delta  method  and  generalized  CIs  perform  well  with  regards  to  coverage 
for  both  and  0^  (~  91  -  97%).  The  GCI  is  the  only  method  that  achieves  or  exceeds  the  desired 
coverage  of  95%  for  n,-  =  10  ,  however  achieving  this  coverage  results  in  Cl  lengths  which  are 
slightly  longer  than  the  other  methods.  For  Hj  >  50  ,  the  delta  method,  generalized,  and  AN 
bootstrap  CIs  perform  similarly.  Over  all  sample  sizes  when  the  variances  are  equal  (cr3  =  1),  the 
GCIs  have  the  best  coverage  and  are  only  slightly  longer  in  some  scenarios. 

When  the  variances  are  not  equal  (cr3  -  2  or  4),  the  coverage  and  lengths  of  all  Cl  methods 
are  unchanged  from  the  equal  variance  scenario  for  6*^  .  However,  0^  depends  on  the  third  class’s 
distributional  parameters  and  therefore,  the  AN  bootstrap  Cl  does  worse  with  respect  to  coverage 
around  0*  for  cr3  =  2  or  4.  The  delta  method’s  coverage  and  lengths  remain  the  same,  and  the  GCI’s 
performance  also  remains  fairly  constant.  The  BP  and  BCa  bootstrap  CIs  have  similar  and  better 
performance  than  the  AN  bootstrap  Cl  when  ctt,  \  . 

The  bias  of  both  optimal  threshold  estimates  are  equally  good  (absolute  bias  e  [.00006,  .03]) 
when  the  variances  are  equal.  The  change  in  variance  structure  has  no  impact  on  the  bias  of  0*  , 
however  the  maximum  absolute  bias  for  0^  increases  from  0.02  to  0.05  when  the  variance  of  the 
third  class  changes.  Symmetry  is  plotted  for  the  delta  method  CIs  (Figure  3.1)  and  the  GCIs  (Figure 
3.2)  around  both  optimal  thresholds  for  a‘3  =  1  and  0-3=4,  rows  1  and  2,  respectively.  For  larger 
values  of  0-3  ,  the  symmetry  of  the  delta  method  Cl  around  0^  becomes  left  skewed  (row  2,  Figure 
3.1).  Once  again  the  GCI  is  less  skewed  than  the  delta  method  Cl,  and  although  the  increase  in  0-3 
appears  to  have  a  slight  impact  on  the  symmetry  of  the  GCI  around  0^  ,  this  change  is  very  small 
compared  to  that  seen  with  the  delta  method  Cl. 

When  the  feature’s  distribution  for  each  class  is  an  independent  gamma,  the  performance  of 
all  Cl  methods  for  0*  and  0*  is  extremely  poor,  and  becomes  worse  as  nj  increases  (Tables  B.5 
and  B.6  for  0*  and  0^  ,  respectively).  Although  using  a  Box-Cox  transformation  provides  a  slight 
increase  in  performance  for  all  methods  (Tables  B.5  and  B.6),  the  performance  is  still  poor  and 
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coverage  is  sporadic.  When  the  Box-Cox  transformation  is  used  on  the  gamma  distributions,  the 
GCI  and  AN  bootstrapped  Cl  have  a  slight  advantage  with  respect  to  coverage  for  most  scenarios, 
although  this  advantage  is  minimal.  The  bias  for  the  estimates  of  the  optimal  thresholds  for  the 
gamma  distributions  and  the  transformed  gamma  distributions  is  also  poor.  For  the  untransformed 
gamma  distributions,  the  absolute  bias  of  0*  ranges  from  .01  to  .9  and  is  largest  for  BC^  values  of 
0.27  and  0.42.  Additionally,  the  absolute  bias  of  0^  ranges  from  .0009  to  1.94  and  performs  best  for 
BC3  values  of  0.63.  The  absolute  bias  increases  as  nj  increases  for  both  optimal  thresholds.  For  the 
transformed  gamma  distributions,  the  absolute  bias  of  0*  ranges  from  .02  to  4.65  and  the  absolute 
bias  of  0*  ranges  from  .04  to  1.4,  demonstrating  worse  estimation  than  the  untransformed  gamma 
distributions.  Both  of  the  threshold  estimates  have  the  largest  bias  for  nj  -  10  ,  and  have  similar 
bias  for  Uj  >  50  (absolute  bias  €  [.02,  .09]). 

All  Cl  methods  have  higher  coverage  with  the  normal  mixtures  than  for  both  gamma  scenarios 
for  0*2  ’  but  not  for  0*  (coverage  probability  for  0*  with  the  normal  mixtures  is  very  poor).  The 
absolute  bias  of  0*  ranges  from  .04  to  .17.  Again,  bias  increases  as  nj  increases.  Also,  the 
bias  is  lowest  for  BC3  values  of  0.91  and  1.23,  which  also  corresponds  to  the  best  coverage  for 
0*  .  The  absolute  bias  of  0*  ranges  from  .000004  to  .2,  and  as  nj  increases  the  bias  decreases. 
Also,  as  the  BC3  value  decreases,  the  bias  increases  as  does  the  coverage  probability.  The  normal 
mixture  distributions  for  the  third  class  are  mixes  of  normals  with  the  same  variance  (equal  to 
one)  and  different  means.  The  normal  mixtures  for  the  first  class  have  both  different  variances 
and  means.  Therefore,  the  shape  of  the  normal  mixture  will  have  an  impact  on  the  performance 
of  the  Cl  around  the  threshold.  The  Cl  around  the  threshold  associated  with  the  mixture  having 
equal  variances  performed  fairly  well  when  compared  to  the  threshold  adjacent  to  the  mixtures  with 
different  variances. 


3.5.2  Unequal  Costs. 


The  unique  advantage  of  using  BC  is  the  ability  to  consider  different  cost  structures  on  the 


misclassification  outcomes.  The  two  cost  structures  considered,  where  Cost  = 


Cl|l  Ci|2  Ci|3 

C2|l  C2|2  C2|3 
C3|l  C3|2  C3|3 


are 


Costi  = 


0  1  2 

1  0  1 

2  1  0 


and  Cost2  - 


0  2  5 
1  0  3 
1  3  0 


.  All  prevalences  remained  the  same  {pj  =  1/3).  Coverage 
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Figure  3.1:  Plots  of  the  difference  between  right  and  left  coverage  probability  (CP)  for  the  delta 
method  CIs  around  BC^,  6*^  ,  and  0^  to  consider  the  symmetry  of  the  CIs  for  nj  =  10  (dotted  line), 
Hj  -  50  (dashed  line),  nj  =  100  (dash-dot  line),  and  nj  =  250  (long  dash  line).  Perfect  symmetry 
would  result  in  values  of  zero,  and  negative  values  indicated  the  right  coverage  is  worse  than  the  left 
coverage. 
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Bayes  Cost 
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Bayes  Cost 


Figure  3.2:  Plots  of  the  differenee  between  right  and  left  eoverage  probability  (CP)  for  the  GCIs 
around  BC3,  0\  ,  and  0^  to  eonsider  the  symmetry  of  the  CIs  for  nj  -  10  (dotted  line),  nj  =  50 
(dashed  line),  nj  =  100  (dash-dot  line),  and  nj  -  250  (long  dash  line).  Perfeet  symmetry  would 
result  in  values  of  zero,  and  negative  values  indieated  the  right  eoverage  is  worse  than  the  left 
eoverage. 
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probabilities  for  the  95%  delta,  generalized  and  bootstrapped  CIs  for  BCj,  ff[  ,  and  0*  are  presented 
in  Tables  B.7  through  B.9.  The  symmetry  of  the  delta  and  generalized  CIs  are  presented  in  the  last 
two  rows  of  Figures  3.1  and  3.2,  respectively. 

Similar  to  the  different  variance  structures  considered  in  Section  3.5.1,  the  varying  cost 
structures  do  not  have  a  noticeable  impact  on  the  bias  or  coverage  probability  for  either  BC  or 
the  optimal  thresholds  for  the  delta  method  or  generalized  CIs.  The  GCIs  continue  to  perform  better 
than  the  other  methods  with  respect  to  coverage  around  BC  at  small  n.  For  larger  n,  the  delta  and 
BCa  CIs  both  perform  well  with  respect  to  coverage.  The  CIs  around  the  optimal  thresholds  have 
larger  length  at  small  n  and  larger  BC3  values.  Once  again,  the  GCIs  are  the  only  CIs  achieving  the 
desired  coverage  for  rij  -  \0  .  For  nj>  50  ,  all  methods  perform  well  with  respect  to  coverage  and 
have  similar  lengths. 

The  symmetry  of  the  delta  method  CIs  around  the  optimal  thresholds  is  altered  by  the  different 
cost  structures.  Cost\  distributes  the  costs  evenly  across  class  one  and  class  three,  resulting  in 
asymmetry  around  both  optimal  thresholds  (0*  is  skewed  right  and  0*  is  skewed  left).  Cost2  assigns 
the  highest  costs  on  class  three,  second  highest  costs  on  class  two  and  lowest  costs  on  class  one. 
This  results  in  the  delta  method  Cl  around  0*  to  become  skewed  right,  while  having  no  impact  on 
the  symmetry  of  0^  .  Asymmetry  of  the  delta  method  CIs  around  the  optimal  threshold  caused  by 
varying  the  cost  structure  was  also  noted  in  [30]  (for  a  two-class  scenario,  found  using  the  GYI). 
Interestingly,  the  delta  method  CIs  around  BC  maintain  a  fairly  constant  asymmetry  for  all  cost  and 
variance  structures  considered  (Figure  3.1,  column  1).  Finally,  although  the  varying  cost  structures 
have  some  impact  on  the  symmetry  of  the  GCIs  (Figure  3.2,  rows  3  and  4),  this  change  is  once  again 
much  smaller  than  that  observed  with  the  delta  method  CIs. 

3.6  Summary 

The  delta  method  and  generalized  CIs  were  derived  for  BC  under  the  assumption  of  a  single 
feature  used  for  classification  that  is  independently  and  normally  distributed  for  each  class  in  a 
multi-state  classification  setting.  Using  simulations,  the  delta  method  CIs  are  shown  to  have  good 
coverage  for  sample  sizes  of  50  or  larger  within  each  class  and  the  GCIs  are  shown  to  have  good 
coverage  for  sample  sizes  of  10  or  more  within  each  class,  when  the  assumption  of  normality  is 
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met  for  both  methods.  Notably,  the  BCa  bootstrap  Cl  with  a  parametric  estimate  of  BC  performs 
very  similar  to  delta  method  Cl  around  BC  for  most  scenarios  when  the  feature  is  normal.  The 
performance  of  the  delta  method,  generalized,  and  BCa  bootstrapped  CIs  around  BC  is  degraded 
when  the  assumption  of  normality  is  not  met  (for  untransformed  distributions).  Performance  of 
the  derived  Cl  methods  around  the  optimal  thresholds  is  also  studied  in  the  simulation.  The  delta 
method  and  generalized  CIs  around  the  optimal  thresholds  perform  well  when  the  assumption  of 
normality  is  met,  and  are  more  robust  to  changes  in  variance  than  the  three  bootstrap  methods 
considered.  When  the  normality  assumption  is  not  met,  all  Cl  methods  around  the  optimal 
thresholds  have  poor  performance,  with  the  performance  being  slightly  better  for  specific  normal 
mixture  distributions.  In  addition,  all  Cl  methods  are  shown  to  be  more  robust  to  departures  from 
normality  for  CIs  around  BC  when  compared  to  the  same  Cl  methods  around  the  optimal  thresholds. 
Finally,  the  GCIs  performed  the  best  with  respect  to  coverage  for  a  normally  distributed  feature  (all 
sample  sizes)  with  similar  lengths  as  the  other  methods.  The  GCI  have  slightly  longer  lengths  for 
the  small  sample  size  scenarios  (rij  =  10).  However,  the  GCIs  are  the  only  method  achieving  the 
desired  coverage  for  this  sample  size,  and  therefore  the  longer  length  is  expected.  Therefore,  the 
GCIs  are  recommended  for  all  sample  sizes  and  costs,  and  the  delta  method  CIs  may  also  be  used 
for  any  large  sample  size  and  cost  scenario  (both  for  a  normally  distributed  feature). 

When  all  Ci\jPj  are  equal,  for  i  j  ,  performance  of  CIs  around  BC  may  be  compared  to  Cl 
methods  for  J,  as  these  two  metrics  measure  performance  equivalently  (see  Theorem  1).  Currently, 
there  are  more  Cl  methods  available  for  J,  although  notably  usually  only  for  two  classes.  In  general, 
the  literature  which  proposes  CIs  for  J  use  inconsistent  bootstrap  methods  for  comparison  of  the  new 
methods’  performance,  making  comparisons  across  all  methods  difficult.  In  [36],  several  estimates 
of  J  were  considered  for  the  bootstrap  CIs  (parametrically,  empirically,  Gaussian  kernel  smoothing, 
and  kernel  smoothing  with  Sheather-Jones  algorithm),  however,  only  BP  CIs  were  presented  which 
were  shown  in  this  chapter  to  only  perform  well  for  very  large  samples  when  considering  a  Cl 
around  BC.  In  [64],  the  empirical  and  parametric  estimate  of  BC  were  both  considered  for  the 
bootstrap  CIs,  however  again,  only  the  BP  Cl  was  utilized.  Three  bootstrap  Cl  methods  (BP,  AN, 
and  BCa)  were  used  in  [56],  however,  only  empirical  estimates  of  J  and  the  optimal  thresholds  were 
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used  with  the  bootstraps  instead  of  parametric  estimates  for  comparison  to  the  delta  method  CIs 
(with  the  bootstrap  Cl  performing  worse  than  the  delta  method  Cl).  This  is  expected,  since  in  [45],  it 
was  shown  that  when  classification  systems  result  from  a  normally  distributed  feature,  an  empirical 
estimate  of  J  has  larger  bias  than  the  parametric  estimate.  Finally,  in  [33],  a  parametric  resample  is 
used  with  the  assumption  of  a  single  feature  with  independent  normal  distributions  for  each  class, 
with  fairly  good  results.  All  other  methods  discussed  utilize  a  nonparametric  resample  of  the  data. 
It  would  seem  that  if  the  feature  is  assumed  to  be  normally  distributed,  then  such  an  assumption 
should  extend  to  the  comparative  methods,  which  suggests  that  the  parametric  estimation  of  BC 
with  a  BCa  bootstrap  is  the  appropriate  bootstrap  method.  In  this  chapter  it  was  shown  that  for 
a  Cl  around  BC  (or  similarly  J)  a  parametric  estimate  of  BC  with  a  BCa  bootstrap  Cl  performs 
very  similar  to  the  delta  method  Cl,  and  therefore  is  recommended  for  use  when  implementing  a 
bootstrap  Cl  for  BC  with  a  normally  distributed  feature.  This  bootstrap  method  outperforms  those 
with  empirical  estimates  of  BC  or  J  as  the  empirical  estimate  results  in  a  higher  bias  compared  to  the 
parametric  estimate  [36,  45].  However,  the  BCa  Cl  does  not  perform  as  well  as  the  GCI  around  BC 
for  a  normally  distributed  feature  with  small  sample  sizes  {nj  =  10)  or  as  well  as  the  delta  method 
Cl  for  accurate  classification  scenarios  with  a  gamma  distributed  feature. 

Another  result  of  interest  from  the  simulation  study  is  the  consistency  of  the  delta  method  Cl 
around  BC  to  be  skewed  left  (under  all  distributional  and  cost  structures  considered).  This  appears 
to  be  a  result  of  the  BC  metric  being  the  minimization  of  the  misclassification  rates  (subject  to 
prevalence  and  cost  multipliers).  This  skewness  is  not  seen  with  the  GCI.  Although  asymmetrical, 
the  delta  method  CIs  still  achieve  the  desired  coverage  probabilities  and  therefore  the  asymmetry 
is  not  necessarily  a  point  of  concern.  The  delta  method  CIs  around  the  optimal  thresholds  are 
symmetric  for  equal  variance  of  the  feature’s  distribution  for  each  class  and  balanced  cost  structures. 
Changing  the  variance  or  cost  structure  will  impact  the  symmetry  of  the  optimal  thresholds’  delta 
method  CIs,  as  might  be  expected.  In  [30],  asymmetry  of  the  delta  method  Cl  around  the  optimal 
threshold  was  also  noted  when  using  the  GYI  for  varying  values  of  R  (the  prevalence  and  cost/benefit 
ratio)  in  the  two-class  framework  with  a  normally  (or  log-normally)  and  independently  distributed 
feature.  Again,  although  the  symmetry  of  the  delta  method  Cl  is  changed,  the  coverage  still 
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meets  desired  levels  for  large  n.  Much  smaller  asymmetries  are  observed  with  the  GCIs.  Because 
symmetry  is  expected  to  behave  similarly  for  other  comparable  scenarios  of  BC  and  threshold  CIs, 
it  will  not  be  examined  further  in  other  methods. 

Numerical  estimation  of  the  partial  derivatives  required  for  implementation  of  the  delta  method 
makes  the  application  of  the  delta  method  CIs  in  this  chapter  (especially  for  k  >  3  classes) 
more  tangible.  The  methods  presented  in  this  chapter  are  especially  useful  since  transformation 
techniques,  such  as  the  employed  Box-Cox  transformation,  can  be  used  to  transform  data  to 
normality  in  order  to  meet  the  required  assumptions  so  long  as  the  underlying  distributions  lie  in 
the  Box-Cox  family  [45].  In  Section  3.5,  it  is  shown  that  the  delta  and  generalized  CIs  around  BC 
perform  well  and  the  CIs  for  the  optimal  thresholds  do  not  perform  well  with  respect  to  coverage 
for  data  transformed  to  normality.  This  further  illustrates  the  usefulness  of  the  Cl  around  BC  for 
choosing  the  best  classifying  feature,  even  when  the  optimal  thresholds  require  further  study  to  be 
determined  accurately. 
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IV.  Nonparamteric  Confidence  Intervals 


4.1  Introduction 

A  Cl  for  BC  that  does  not  require  information  about  the  structure  of  the  classification  system 
or  feature  distributions  is  derived  in  this  chapter  for  any  k  classes.  A  nonparametric  method  for 
constructing  a  Cl  around  BC  is  useful  because  small  data  sets  or  classifiers  where  distributional 
assumptions  are  not  suitable  occur  regularly  [5,  35,  37,  38,  57,  59].  Current  nonparametric  methods 
for  J  require  large  sample  sizes  (see  Section  2.5.1).  Although  no  distributional  assumption  is  placed 
on  the  underlying  feature(s),  the  classification  outcomes  from  each  class  resulting  from  a  fixed  0  e  0 
are  modeled  with  independent  multinomial  distributions. 

This  nonparametric  Cl  around  BC  is  developed  in  Section  4.2  using  the  fiducial  argument. 
Available  bootstrap  methods  that  may  be  used  in  the  nonparametric  framework  for  constructing  a 
Cl  around  BC  are  presented  in  Section  4.3.  In  Section  4.4,  simulations  are  used  to  demonstrate 
the  performance  of  the  newly  developed  method  in  Section  4.2  and  compare  its  performance  to 
other  available  CIs  around  BC  in  two-  and  three-class  scenarios.  Scenarios  where  the  underlying 
classification  system  is  unknown  and  scenarios  with  known  normal  distributions  are  considered.  In 
Section  4.5,  the  newly  developed  method  is  compared  further  with  available  methods  for  developing 
simultaneous  CIs  around  multinomial  probabilities.  Section  4.6  contains  a  summary  of  the  results. 

4.2  Fiducial  Intervals 

This  section  develops  a  Cl  for  BC  that  requires  no  underlying  distributional  assumptions  on  the 
classification  system.  This  Cl  is  developed  using  the  fiducial  approach  which  was  first  introduced  in 
1930  by  R.A.  Fisher  in  his  paper,  ’’Inverse  Probability”  [21].  The  fiducial  argument  has  been  used 
successfully  for  similar  inference  on  statistical  parameters  [31,  74,  78],  one  very  popular  example 
being  the  Clopper-Pearson  Cl  ^  for  a  binomial  proportion  (see  Section  2.7. 1.1)  [13, 72].  The  method 
developed  in  this  section  may  be  implemented  for  any  (small)  sample,  k-class  classification  system 
and  has  a  minimum  coverage  of  (1  -  a)  100%. 

^Or  fiducial  interval,  as  these  two  terms  are  used  interchangeably  by  Clopper  and  Pearson  [13,  72] 
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The  proposed  Cl  requires  only  the  observed  classification  outcomes,  and  assumes  the  outcomes 
are  distributed  multinomial.  Section  4.2.1  derives  the  proposed  method  using  the  fiducial  argument 
for  the  A:-class  BC  with  all  Ci\jPj  equal,  for  i  t  j  ■  In  Section  4.2.2,  the  method  is  extended  for 
BC  with  unequal  costs  and  prevalences.  An  algorithm  for  computing  the  upper  and  lower  bounds  is 
presented  in  Section  4.2.3  and  an  equivalence  to  a  multiple  of  the  Clopper-Pearson  Cl  under  specific 
conditions  is  also  presented  in  Section  4.2. 3.2.  Finally,  this  method  may  be  used  equivalently  for  J, 
which  is  shown  in  Section  4.2.4. 


Definition  3  (Fiducial  Interval).  A  {\-a)\00%  fiducial  interval  for  a  parameter  6  is  the  set  of  values 
of  9  which  could  have  given  rise  to  the  observed  value  Y=y  with  the  specified  probability  \  —  a,  and 
Y  =  t{X\,  ...,Xn)  a  statistic  from  the  random  sample  X\, ...,  with  distribution  Fy(y|0)  [72], 


Therefore  a  (1  -  a)  100%  fiducial  interval  for  a  parameter  6  derived  from  an  observed  statistic 
Y  =  t(Xi,  ...,Xn)  can  be  found  as  the  solutions  for  6i  and  %  in  the  following  equations  [72]: 

Pr(F>y|0i)-^  (4.1) 

Pr{Y  <  y  I  0^;)  =  I  (4.2) 

4.2.1  Bayes  Cost  with  Equal  Weights. 

Initially,  it  is  assumed  that  all  Ci\jPj  are  equal  to  one,  for  i  +  f.  Then  BC  can  be  expressed  as 
the  sum  of  the  l<f  -k  misclassification  probabilities  resulting  from  the  ^-class  classification  system. 
Here,  the  minimization  is  excluded  because  it  is  assumed  the  classifier  is  applied  at  its  optimal 

setting,  or  more  generally  at  a  fixed  setting.  Specifically, 

k  k 

=  ^4.3) 


where  each  pi\j  is  the  probability  of  classifying  an  observation  from  class  j  as  class  i  {j  =  .  ,k 

and  i  -  .  ,k)  .  The  statistic  used  to  estimate  BC  is  F  =  BC  where 


k  k  Y 
i=\  ;=1  J 


(4.4) 


long  as  all  multipliers  on  each  misclassification  probability  are  equal,  the  multiplier  can  be  scaled  to  one  without 
changing  the  classification  outcomes. 
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Each  Xi\j  is  a  multinomial  random  variable  representing  the  number  of  observations  classified  as  the 
class  when  their  true  class  is  j  ,  and  nj  is  the  total  number  of  observations  for  the  class. 

The  statistic  F  is  a  function  of  discrete  random  variables  representing  a  projection  into  the  one 
dimensional  rational  space  (Q),  and  can  be  ordered  (possibly  with  ties).  From  Equations  4.1  and 
4.2,  the  (1  -  Q')100%  fiducial  interval  for  BC  from  an  observed  statistic  F  =  y  is  determined  by  the 
values  of  BCl  and  BCu  that  are  the  solutions  to  the  following  equations: 

Pr(F  >  y  I  BCl)  -  ^  (4.5) 

PriY  <  y  I  BCu)  =  ^  (4-6) 


To  find  these  solutions,  the  probability  distribution  of  F  with  respect  to  BC  must  be  determined. 
For  each  class,  Xj  =  . . . ,  X^y)  ~  multinomial{pj,  nj)  ,  where  each  Xty  is  a  nonnegative  integer 

and  Xiij  -  rij  .  The  multinomial  pmf  for  Xj  is  of  the  form 

k 

;=i 

[12].  Therefore,  the  joint  pmf  for  all  P  random  variables,  X  =  (Xi , . . . ,  X^) ,  from  the  k  independent 

multinomial  distributions  resulting  from  the  ^-class  classifier  is 

k 

/x(x  I  p)  =  ]~[/Xj(Xj) 

;=i 


=nn 


Hi 

Pi\j 

— i 

Xi\j\ 


(4.8) 


Let  S  represent  the  probability  parameter  space  for  the  entire  experiment,  p  e  .S  =  {p  = 
(Pi,---,Pk)  :  Pj  =  {p\\j,...,Pk\j),Pi\j  >  0,  andXiiP/li  =  1)  .  Also  let  54  be  the  joint 
multinomial  sample  space  that  is  the  set  of  1  x  sized  vectors  where  J{  -  {x  =  (xi, . . .  ,Xk)  : 
Xk  =  . . . ,  Xk\j),  Xi\j  €  IC ,  Y!i=\  Xi\j  -  nj]  .  For  a  single  multinomial  distribution,  there  are 


/ 


n  +  k—  \ 


(4.9) 

u 

distinct  elements  in  the  sample  space.  For  example,  for  ^  =  3  outcomes  and  n  -  2  observations 
from  a  single  multinomial  experiment,  there  are  six  elements  (shown  in  Table  4.1).  In  addition,  for 
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Table  4. 1 :  The  multinomial  sample  space  for  3  outcomes  and  n  =  2  .  Each  row  represents  a  potential 
draw  from  the  multinomial  experiment. 


2^117 

2^21; 

2^31; 

2 

0 

0 

1 

0 

1 

1 

1 

0 

0 

0 

2 

0 

1 

1 

0 

2 

0 

a  ^-class  classification  system,  there  are 


k 

n 

nj  +  k—  1 

i=i 

\  4  / 

(4.10) 


distinct  ways  of  sampling  from  this  joint  multinomial  experiment  (ie.  number  of  elements  in  J?l). 
Clearly,  as  k  and  each  nj  increase,  this  sample  space  becomes  large.  For  the  previous  example  where 
^  =  3  ,  if  each  nj  -  2  there  are  216  distinct  ways  of  sampling  from  the  joint  multinomial  experiment. 


With  the  assumption  of  all  Ci\jPj  being  equal,  for  i  j  ,  the  sum  of  the  k  -  \  misclassification 


rates  for  each  class  may  be  treated  as  a  total  misclassification  rate  for  that  class.  BC  can  then 


be  defined  using  the  total  misclassifications  only,  as  it  is  unnecessary  to  distinguish  between  the 


types  of  misclassifications  (e.g.  X211  vs  2f3|i).  For  simplicity  of  notation,  the  sum  of  the  k  -  \ 


misclassification  probabilities  from  each  class  is  denoted  pjc\j  : 


k 

i=l 


The  total  number  of  misclassified  observations  from  each  class  is  denoted  Xpy  : 


k 

^f\j  =  Y.^i\j  (4-12) 

i=\ 


The  independent  multinomial  distributions  can  be  collapsed  into  k  independent  binomial 


distributions,  with  the  total  misclassifications  representing  success  and  the  correct  classifications 
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representing  failure  in  each  class.  Thus  for  each  fixed  j  , 


Hj-i  ^i\j  -  nj 

^  =  (4.13) 

^  #  of  misclassifications  =  nj  -#  of  correct  classifications 
^  Xf\j  ~  Bin(?ij,  pf\j) 

Considering  only  the  total  misclassifications  for  each  class  (modeled  as  independent  binomial 
random  variables),  the  size  of  the  sample  space  for  the  classification  system  is  reduced  to 

k 

Y\inj+l)  (4.14) 

j=i 

The  reduction  of  the  sample  space  is  demonstrated  in  Table  4.2  with  a  single  class  from  the  previous 
example,  where  k  =  3  and  n  =  2  .  In  this  example,  the  number  of  elements  in  the  sample  space  for 
one  class  is  reduced  from  six  (multinomial  sample  space)  to  three  (binomial  sample  space). 

Therefore,  the  joint  pmf  for  the  independent  multinomial  random  variables,  X  = 
(Xi|i,  X211, . . . ,  X^|/t)  5  can  be  expressed  using  the  joint  pmf  for  k  independent  binomial  random 


Table  4.2:  A  multinomial  sample  space  reduced  to  a  binomial  sample  space  for  3  outcomes  and 
n  =  2  .  Each  row  represents  a  potential  draw  from  the  experiment  (assuming  the  truth  class  is  1, 
therefore  Xip  is  the  correct  classification) 


Xi|i  X211  X311 

2  0  0 

1  0  1 

1  1  0 

0  0  2 

0  1  1 

0  2  0 


Xi|i  X211+X311 

2  0+0=0 

1  0+1=1 

1  1+0=1 

0  0+2=2 

0  1+1=2 

0  2+0=2 


Xf\f  Xfii 
2  0  =  n  —  2 

1  1 =n- 1 

0  2  =  n  —  0 
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variables,  X  =  (Xic|i , . . . ,  Xk'=\k) ,  where  each  Xf\j  is  a  nonnegative  integer  and  0  <  Xf\j  <  nj  : 


nj\ 


HP 


(4.15) 


Here,  qf\j  =  (1  -  Pf\j)  and  p  =  {p\<:\\,  ■ .  ■ , Pk'^\k)  is  a  vector  of  the  k  total  misclassification 
probabilities  from  the  classification  system. 

Recall  ^  is  the  joint  multinomial  sample  space.  Let  the  reduced  sample  space,  S  ,  be  the  joint 
binomial  sample  space  that  is  the  set  of  1  x  ^  sized  vectors  where  S  -  {x-  (xicp, . . . ,  Xko\k)  :  xpq  € 

—  ^  X, 

Z'*’,  xp\j  <  tij]  .  Then  the  sample  space  for  Y  =  BC  =  [y  y  =  ^  X-^,x€S).  Therefore, 

the  pmf  of  Y  with  respect  to  the  binomial  probabilities  p  =  (pic|i, .  •  - ,  Pk'=\k)  can  be  written  in  terms 
of  the  joint  binomial  distribution  as 


fyiy  I  p)  -  P(F  -  y  I  p) 


-  P 


k  k  Y 

1=1  /=i  y 
\  ii=j 


=  I  P) 

xeS 

F=>' 


(4.16) 


where  /x(x  |  p)  is  defined  in  Equafion  4.15.  The  lasf  line  in  Equation  4.16  is  a  summation  because 
if  is  possible  fo  have  more  fhan  one  x  e  S  fhaf  resulfs  in  F  =  y  (fhese  are  lies  in  Ihe  ordered  sample 
space).  Eor  example,  if  ^  =  3  and  each  nj  =  2  ,  an  observed  BC  =  0.5  will  occur  if  fhere  is 
one  misclassificafion  ouf  of  fhe  lolal  six  observafions.  There  are  Ihree  ways  of  observing  only  one 
misclassification  from  Ihis  experimenl,  resulling  in  fhe  ties  in  fhe  sample  space  for  F  =  y  .  These 
ties  are  shown  in  Table  4.3. 


Using  Equafion  4.16,  fhe  CDE  of  F  wifh  respecf  fo  p  =  (pic|i, . . .  ,pko\k)  is 

y  .V 

Fyiy  I  P)  ^  ^  frit  I  P)  /x(x  I  p) 

t=0  t=0  xeS 

¥=t 


(4.17) 
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Table  4.3:  Ties  in  the  joint  binomial  sample  space  for  3  classes,  nj  -  2  ,  and  BC  =  0.5  .  Each  row 
represents  an  element  from  the  joint  binomial  experiments. 


Class  1  (Xic|i) 

Class  2  iX2o\2) 

Class  3  (X3C|3) 

BC 

0 

0 

1 

0.5 

0 

1 

0 

0.5 

1 

0 

0 

0.5 

For  each  fixed  BC,  there  exists  infinite  p  =  {pw\\, .  ■ . ,  Php\k)  such  that  p^l  =  BC  (where  1  is  a  ^  x  1 
sized  vector  of  ones),  resulting  in  different  values  of  |  p)  for  a  given  BC  and  observed  y  =  BC 
(except  for  the  trivial  cases  where  BC  =  0  or  BC  =  k).  This  makes  finding  a  unique  solution  for  the 
fiducial  bounds  on  BC,  given  in  Equations  4.5  and  4.6,  impossible.  To  demonstrate  multiple  values 
of  Fyiy  I  p)  for  each  fixed  BC,  an  example  where  y  =  0.5  (left)  and  y  -  \  (right)  is  shown  in  Figure 
4.1.  This  example  plots  Fyiy  \  p)  (Equation  4.17,  plotted  with  black  dots)  against  BC  with  multiple 
p  (p^l  =  BC).  Therefore,  define  Fyiy  \  BC)  to  be  the  maximum  value  of  Fyiy  \  p)  for  each  fixed 
BC  =  p^l  and  E|(j  I  ^C)  to  be  the  minimum  value  of  Fyiy  \  p)  for  each  fixed  BC  =  p^l  .  Then 
these  two  functions  are  one-to-one  and  onto  from  BC  to  the  Fyiy  \  p)  space,  and  unique  solutions 
for  the  fiducial  bounds  can  be  found.  These  two  new  functions  are  shown  in  Figure  4. 1  where  the 
blue  line  is  Fyiy  \  BC)  and  the  red  line  is  F^iy  \  BC)  .  These  functions  can  be  expressed  using 
Equation  4.17  as 


Fiiy  I  BC)  =  max 

p:p^l=BC 

BCeSC 


Fliy  I  BC)  =  min 

p:p^l=BC 

BCeSC 


2]2]/x(xip) 


t<y  xeS 
Y=t 


2]2]/x(xip) 


t<y  xeS 
Y=t 


(4.18) 


(4.19) 


where  SC  is  the  BC  sample  space  such  that  SC  =  {BC  :  BC  =  p^l,p  =  (piqi, .  - . ,  Pk<=\k),  PP\j  £ 


[0,1]). 
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Figure  4.1:  Example  of  Fy{y  \  p)  vs  BC  for  an  observed  y  =  0.5  (left)  and  3^  =  1  (right). 


Combining  Equations  4.18  and  4.19  with  Equations  4.5  and  4.6,  the  lower  (BCl)  and  upper 
{BCu)  bounds  for  the  (1  -  a)100%  fidueial  interval  for  BC  from  an  observed  statistic  y  are: 


' 

BCl  =  sup  - 

BC  €  SC  such  that  1  -  min 

p:p^l=BC 

ZZ-^^(xip) 

(<v*  xeS 
y=t 

a 

<  - 
~  2 

' 

' 

BCu  =  inf  ■ 

BC  €  SC  such  that  max 

p:p^l=BC 

t<y  xeS 
y=t 

a 

<  -  > 
“  2 

(4.20) 


(4.21) 


where  y*  is  the  ordered  value  of  F  e  2/  directly  less  than  y.  When  y  =  0  or  y  =  k  ,  the  lower 
bound  is  BCl  =  0  and  the  upper  bound  is  BCu  -  ^  »  respectively.  This  is  due  to  the  fact  that 
Y  €  [0,^]  when  all  Ci\jPj  are  assumed  equal  to  one,  for  i  t  j  ,  making  Pr{Y  >  0  |  BC)  =  1  and 
Pr{Y  <  k  I  BC)  =  1  .  The  upper  and  lower  bounds  expressed  in  Equations  4.20  and  4.21  may  be 
found  by  searching  all  p  within  a  certain  tolerance,  which  motivates  using  inequalities  to  meet  the 
minimum  coverage  desired.  The  coverage  of  this  Cl  is  addressed  in  the  following  theorem. 
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Theorem  5.  The  upper  and  lower  bounds  for  BC  given  by 


' 

' 

BCl  -  sup  ■ 

BC  €  BC  such  that  1  -  min 

p:p^l=BC 

ZZ-^x(xip) 

?<>>*  xeS 
y=t 

a 

<  - 
~  2 

' 

' 

BCu  ^  inf  ■ 

BC  €  BC  such  that  max 

p:p^l=fiC 

ZZ-^^(xip) 

t<y  xeS 
y=t 

a 

<  - 
~  2 

(4.20) 


(4.21) 


create  a  (1  -  a)l00%  fiducial  interval  around  BC  when  weights  on  misclassification  costs  are  equal 
with  a  confidence  coefficient  of  at  least  (1  -  a)  100%. 


Proof.  Let  BC  e  BC  ,  y  =  BC  ,  and  p  =  (pic|i, . . . ,  pk'^\k)  be  k  joint  binomial  total  misclassification 
probabilities  from  a  k-class  classification  system.  Since  BC  =  2^5=1  Pi\j  ’  ^f^all  increase  of 

e  in  any  one  pi\j  will  result  in  an  increase  of  e  in  BC.  For  the  upper  bound  this  results  in, 


Pr{Y  <  y  I  BCu)  <  max 

p:p^l=BC(; 

BCueSC 


2]2]/x(xip) 


t<y  xeS 
Y=t 


Pr{Y  <  y  I  BCu)  < 


a 


(4.22) 


Now  lety*  be  the  ordered  value  ofYeN  directly  less  than  y.  Then  for  the  lower  bound, 


Pr{Y  >  y  I  BCu)  <  1  -  min 

P:P^1=BCl 

BCLseC 


2]2]/x(xip) 


(<y*  xeS 
Y=t 

Pr{Y  >  y  I  BCu)  <  | 


(4.23) 


The  confidence  coefficient  for  any  Cl  is  given  generally  in  [72]  as 


Pr{9u  <e<eu)  =  Pr{Y  <y\0L)-  Pr{Y  <  y  \  9u) 


(4.24) 


Therefore  for  the  fiducial  interval  around  BC 


PriBCi  <BC  <  BCu)  =  Pr{Y  <  y  \  BCu)  -  Pr(Y  <  y  \  BCu) 

=  1  -  PriY  >  y  I  BCl)  -  Pr{Y  <  y  \  BCu) 
.  a  a 

>1 - =  1  -  a 

2  2 


[PriBC  €  [BCl,  BCu]  \y)]  >  1  -  a 


(4.25) 


□ 
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The  proof  for  Theorem  5  does  not  depend  on  the  sample  size  used  to  develop  the  fiducial 
interval.  Therefore,  the  minimum  desired  coverage  of  (1  -  a)  100%  will  be  met  for  any  sample  size, 
making  this  method  appropriate  for  small  samples  where  approximate  methods  fail  to  achieve  the 
necessary  coverage.  Also,  this  method  relies  on  ordering  the  sample  space  of  the  k  joint  independent 
binomial  distributions.  This  sample  space  becomes  large  as  k  and  each  nj  increase,  making  this 
method,  in  addition  to  being  suitable,  more  practical  for  small  samples. 

4.2.2  Bayes  Cost  with  Unequal  Weights. 

When  all  Ci\jPj  are  not  equal,  for  j  ,  the  method  for  finding  fhe  fiducial  interval  around  BC 
becomes  more  involved  compared  to  when  all  multipliers  are  equal.  First,  the  outcomes  from  the 
classification  system  can  no  longer  be  reduced  to  binomial  random  variables.  BC  is  more  generally 
defined  in  fhis  scenario  as, 

k  k 

Ci\jPjPi\j  (4.26) 


where  each  pi\j  is  fhe  probabilify  of  classifying  an  observafion  from  class  j  as  class  i  ,  pj  is  fhe 
prevalence  of  class  j  ,  and  cty  is  fhe  cosl  associated  wifh  classifying  class  j  as  class  i  {j  =  I, ...  ,k 
and  i  -  1, . . . ,  k),  and  fhe  minimization  is  excluded  because  if  is  assumed  fhe  classification  sysfem 
is  applied  af  ifs  opfimal  sellings.  The  slafislic  used  lo  eslimale  BC  is  F  =  BC  , 

k  k 


(4.27) 


(=1  7=1 
1*1 


Because  each  misclassificalion  wifh  respecl  lo  Irulh  musl  be  considered  uniquely  (for  example, 
X211  vs  X311),  fhe  k^  random  variables  X  =  (Xi|i,X2|i, . . .  ,Xk-\\k,Xk\k)  musl  be  modeled  wifh  fhe 
mulfinomial  disfribulion^ 


k  k 

=nn 

,=i  7=1 


njl 


‘l\j 

Xi\jl 


(4.8) 


^To  reduce  computation  time  when  searching  for  the  lower  and  upper  Cl  bounds  on  BC,  if  any  of  the  k  classes  have 
equal  weights  on  the  class  misclassihcations,  this  class’s  total  misclassification  may  be  modeled  as  binomial,  and  the 
binomial  pmf  may  be  used  for  that  specific  /xj(Xj) . 
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Here  X  =  (Xi|i,X2|i, . . .  ,Xk-\\k,Xk\k)  and  p  €  6^  ^  {p  ^  (pi, . . .  ,Pk)  :  Pk  =  {pi\j,  ■  ■  .,Pk\j),Pi\j  >  0  , 
and  Pi\j  =  1}  ■  Again  let  Jl  be  the  joint  multinomial  sample  space  defined  in  Section 
4.2.1.  Similar  to  the  method  in  Section  4.2.1,  the  CDF  of  Y  with  respect  to  the  multinomial 


misclassification  probabilities  can  be  written  as 


.V  y 

Fyiy  I  P)  =  ^  frit  I  P)  /x(x  I  P)  (4.28) 

t=0  t=0  xeJi 

Y=t 


where  /x(x  |  p)  is  defined  in  Equafion  4.8  and  =  {y  y  =  E  E  <^i\jPj^^'^  ^  •FI}  . 

When  all  ctypj  are  nol  equal,  for  i  j  ,  BC  is  no  longer  defined  simply  as  fhe  sum  of  fhe 
misclassificalion  probabilities.  Therefore,  any  small  increase  of  e  in  any  one  pi\j  will  nol  necessarily 
resulf  in  an  increase  of  e  in  BC.  If  is  clear  fhal  when  fhe  weighfs  are  differenf,  a  small  increase  in  any 
one  pi\j  will  have  a  differenf  impacf  on  BC  depending  on  fhe  specific  misclassification  probabilify’s 
cosf  and  prevalence.  Therefore  if  Fy(y  \  BC)  and  F|(y  I  FC)  are  defined  as  fhey  were  for  equal 
weighfs  in  Equations  4.18  and  4.19  in  Section  4.2.1,  fhe  coverage  probabilify  of  fhe  Cl  will  nol 
be  guaranfeed  for  unequal  cosfs  of  misclassification.  Insfead,  a  small  adjusfmenl  is  made  lo  fhese 
definilions  lo  ensure  coverage  for  Cl  around  BC  wilh  unequal  cosfs  or  prevalences  meels  fhe  desired 
level  of  1  -  or.  Define  Iwo  slep  funcfions. 


F\(y 


BCu)  =  max  [Fy(y  |  BC)]  =  max 
BC>BCu  ^  J  BC>BCu 


max 
p:p^c= BC 
BCeSC 


t<y  xeJl 
y=t 


Ft(y 


where  c  is 
(c  ^  (Cl,. 
1  -  Fjiy  I 


BCl)  -  max  [l  -  Fy(y  |  BC)]  =  max 
bc<bCl  *■  J  nr<Fir 


BC<BCl 


1  -  min 
p:p^c=BC 
BCeSC 


2,2^/x(xip) 


t<y  xeJl 
y=t 


(4.29) 


(4.30) 


a  veclor  of  fhe  conslanf  mullipliers  lo  be  placed  on  each  misclassification  probabilify 
..,Ck)  ,  where  Cj  =  {ciijpj, . . .  ,Ck\jPj)  ,  and  aijpj  e  M+).  A  plol  of  F],(y  |  BC)  , 
BC)  ,  Fy(y  I  BC)  ,  and  ftiy  \  BC)  is  presenfed  in  Eigure  4.2  for  an  example  scenario 


where  BC  =  0.99  when  ni  =  3  ,  n2  =  5  ,  =  6  ,  and  Cost  - 

k)- 


0  3  5 
3  0  1 
1  5  0 


(all  pj  are  assumed  equal  fo 
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Figure  4.2:  Example  of  F^(y  \  BCu)  and  Fy{y  \  BCi)  plotted  vs  BC  for  an  observed  BC  =  0.98889 


when  Hi  =  3  ,n2  -  5  ,nj  =  6  ,  and  Cost  = 


0  3  5 
3  0  1 
1  5  0 


.  The  values  for  Fyiy  \  BC)  and  1  -Fyiy  \  BC)  are 


plotted  with  the  deereasing  and  increasing  black  dots,  respectively.  Then  the  values  for  Fy(y  \  BCu) 


are  plotted  with  the  blue  solid  line  and  for  Fyiy  \  BCu)  with  the  red  dashed  line.  The  black  horizontal 


line  is  drawn  at  |  =  0.025. 


The  ( 1  -  O')  100%  fiducial  interval  for  BC  from  an  observed  statistic  y  is  the  BCu  and  BCu  given 
by: 


' 

' 

BCu  =  sup 

BC  €  SC  such  that  max 

p:p^c<6C 

2]2]/x(xip) 

t>y  x£J?l 

a 

<  -  > 
“  2 

y=t 

' 

BCu  =  inf  ■ 

BC  €  SC  such  that  max 

p:p^c>BC 

ZZ-^^(xip) 

t<y  xeJi 
y=t 

a 

<  -  > 
“  2 

and  SC  is  the  parameter  space  for  BC  with  unequal  weights  where  SC  =  {BC  :  BC  =  p^c,c  = 
(ci, . . . ,  Ck),  Cj  =  {ci\jPj,  ■■■,  Ck\jPj),  and  Ci\jPj  €  p  €  5}  .  When  y  =  0  or  y  =  sup{d/}  ,  the  lower 
bound  is  BCu  -  0  and  the  upper  bound  is  BCu  =  sup{SC)  ,  respectively.  This  is  due  to  the  fact 
that  F  e  [0,  sup{d/)]  when  all  Ci\jPj  are  not  equal,  for  i  j  ,  and  all  Ci\jPj  are  greater  than  or  equal 
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to  zero,  making  Pr{Y  >  0  |  BC)  =  1  and  Pr{Y  <  sup{d/}  |  BC)  =  1  .  The  lower  and  upper  bounds 
given  in  Equations  4.31  and  4.32  may  be  found  by  searching  all  p  within  a  certain  tolerance,  which 
is  why  they  are  solved  using  inequalities  to  meet  the  desired  minimum  coverage.  The  coverage  of 
this  Cl  is  addressed  in  the  following  theorem  and  proof. 


Theorem  6.  The  upper  and  lower  bounds  for  BC  given  by 


' 

' 

BCl  =  sup 

BC  e  BC  such  that  max 

p:p^c<BC 

2]2]/x(xip) 

t>y  xeJi 

a 

<  -  > 
“  2 

y=t 

' 

BCu  =  inf  ■ 

BC  €  BC  such  that  max 

p:p^c>BC 

Z  Z  1 

t<y  xeJi 
y=t 

a 

<  -  > 
“  2 

(4.31) 


(4.32) 


create  a  {\  —  a)l00%  fiducial  interval  around  BC  when  weights  on  misclassification  rates  are  not 
equal  with  a  confidence  coefficient  of  at  least  (1  -  a)  100%. 

Proof.  Let  BC  e  BC  ,  y  =  BC  ,  tp  ^  S  be  the  k  joint  multinomial  probabilities  from  a  k- 

class  classification  system,  and  c  =  (ci, . . . , Ck), Cj  =  {c\\jPj, . . .  ,Ck\jPj),  andci\jPj  €  M'*'.  Also, 
k  k 

BC  =  X  Z  (^i\jPjPi\j  ■  ^or  the  upper  bound  this  results  in, 

r=  1,1^7 ,7=1 


Pr(Y  <  y  I  BCn)  <  max 
BC>BCu 


max 

p:p^c=BC 

BCeSC 


2]2/x(x|p) 


t<v  xeJl 
y=t 


Pr{Y  <  y  I  BCu)  < 


a 


(4.33) 


Now  let  y*  be  the  ordered  value  ofYe}/  directly  less  than  y.  Then  for  the  lower  bound, 

2]2]/x(xip) 


Pr(Y  >  y  I  BCl)  <  max 
BC<BCl 


1  -  min 
p:p^c=BC 
BCeSC 


x£j?( 

y=t 


a 


Pr{Y  >  y  I  BCl)  <  ^ 


(4.34) 
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The  confidence  coefficient  for  the  fiducial  interval  around  any  BC  is 


PriBCi  <BC  <  BCu)  =  Pr{Y  <  y  \  BCffi  -  Pr{Y  <  y  \  BCu) 


=  1  -  PriY  >  y  I  BCl)  -  Pr{Y  <  y  \  BCu) 


^  [Pr{BC  e  {BCl,  BCu)  \y)]>  I -a 


(4.35) 


□ 


Once  again,  the  proof  for  Theorem  6  does  not  depend  on  the  sample  size  used  to  develop  the 


fiducial  interval,  and  therefore  the  minimum  desired  coverage  of  (1  -  or)  100%  will  be  met  for  any 
sample  size.  Also,  using  the  definition  of  the  confidence  coefficient. 


PriBCu  <BC  <  BCu)  =  Pr{Y  <  y  \  BCu)  -  Pr{Y  <  y  \  BCu) 


(4.36) 


the  confidence  coefficient  for  this  Cl  for  any  BC  =  p^l  can  be  calculated.  However,  the  specific  p 
must  be  known  in  order  to  determine  the  probability  of  observing  each  X  in  the  J{  sample  space. 


For  this  reason,  the  confidence  coefficient  can  be  calculated  for  a  specific  set  of  misclassification 
probabilities  for  each  class,  but  not  explicitly  for  a  given  BC,  because  there  are  infinite  p  that  could 
result  in  each  BC  (except  the  trivial  cases  where  BC  =  0  or  BC  =  sup{SC)). 

4.2.3  Fiducial  Interval  around  Bayes  Cost  Algorithm. 

A  general  procedure  is  presented  for  finding  the  fiducial  interval  around  BC  in  Section 
4.2.3. 1.  A  simplified  procedure  is  presented  in  Section  4.2. 3. 2  for  scenarios  where  the  weights 
on  misclassiifcation  outcomes  (c,|ypy)  and  all  class  sample  sizes  {nfi  are  equal.  If  BC  =  0  or 
BC  =  sup{d/)  ,  the  lower  bound  is  0  or  the  upper  bound  is  supjSC)  ,  respectively.  For  such  a  case, 
the  algorithm  should  be  used  to  find  the  remaining  upper  or  lower  bound  only. 


4.2.3. 1  General  Case. 


The  following  is  an  outline  of  steps  to  compute  the  proposed  fiducial  interval  for  k  classes,  an 


observed  y  =  BC  ,  and  classification  system  with  either  equal  or  unequal  weights  (explained  with 
options  for  equal  [unequal]  weights  throughout). 
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1.  Create  the  joint  binomial  [multinomial]  sample  space,  S  [J?l]  ,  for  the  k  independent 


binomial  [multinomial]  distributions  from  each  class  for  equal  [unequal]  weights  (this 

t  w 


sample  space  will  have  Y[j=i{^j  +  1) 


n;=i 


fij  +  k—  \ 


elements). 


2.  Order  the  sample  space,  S  [J?l]  ,  by  each  element’s  resulting  y  =  X  X  (^i\jPj~  ■ 

3.  Create  the  joint  binomial  [multinomial]  parameter  space,  p  =  (picp, . . . 

[p  -  (Pi»--->Pk)]  »  to  search  for  BCl  and  BCu-  This  parameter  space  is  infinite, 
therefore  the  search  for  the  upper  and  lower  bounds  on  BC  will  only  consider  all 
p  generated  by  a  specified  step  or  precision,  6  .  (It  is  recommended  to  start  with  a 
larger  6  ,  such  as  d  =  0.2  and  consider  smaller  6  while  narrowing  in  on  the  solution  to 
conserve  code  run  time.) 

4.  For  each  element  of  the  parameter  space  created  in  Step  3,  apply  Equation  4.16  [4.8] 
and  sum  and  store  the  resulting  /x(x  |  p)  from  all  elements  of  the  S  [Jl]  sample  space 
whose  corresponding  y  is  less  than  or  equal  to  BC  . 

k  ^  ^ 

5.  Calculate  BC  =  X/=i  Pjc\j  [BC  =  'Z  "L  (^i\jPjPi\j]  for  oach  element  in  the  joint 

i=U*jj=l 


parameter  space  created  in  Step  3. 

6.  For  each  fixed  BC  resulting  from  the  parameter  space  found  in  Step  5,  determine 
and  store  the  maximum  value  of  the  sum  in  Step  4  (this  gives  Fy(y  \  BC)). 

6a.  For  unequal  weights  only,  create  the  step  function  in  Equation  4.29.  This  is 
done  by  determining  the  maximum  value  from  Step  6  for  all  BC  €  SC  values  greater 
than  or  equal  to  each  specific  BC  value.  For  each  BC  value  this  gives  F^{y  \  BC)  . 

7.  The  upper  bound  {BCu)  is  determined  as  the  smallest  BC  whose  maximum  value 
from  Step  6  [6a]  is  <  a/2. 

8.  For  the  lower  bound,  repeat  Steps  4-6,  however,  instead  of  summing  the  elements 
of  the  binomial  [multinomial]  sample  space  where  y  <  BC  ,  sum  the  elements  for 
which  y  >  BC  .  Then,  for  each  BC  from  the  binomial  [multinomial]  parameter  space, 
determine  the  maximum  value  resulting  from  this  sum  which  gives  1  -  F^{y  \  BC)  . 

8a.  For  unequal  weights  only,  the  maximum  value  of  1  -  Fy(y  |  BC)  for  all 
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BC  e  BC  less  than  or  equal  to  each  fixed  BC  value  is  found  which  gives  Fy{y  \  BC) 


(Equation  4.30). 

9.  The  lower  bound  (BCi)  is  determined  as  the  largest  BC  where  1  -  Fyiy  \  BC) 
[F\(y  I  BC)]  is  <  a/2  . 

10.  Improve  the  precision  of  the  solution  by  repeating  Steps  3-9  iteratively,  using 
parameter  spaces  with  smaller  6  values.  Before  applying  Steps  4-9  reduce  the  joint 
binomial  [multinomial]  parameter  space  to  be  searched  by  only  considering  elements 
resulting  in  BC  values  which  are  ±25  from  the  previous  BCl  or  BCu  for  finding  fhe 
lower  or  upper  bound,  respecfively. 

4.2.3.2  Special  Case:  Equal  Sample  Sizes  and  Weights. 

When  sample  sizes  (nj)  and  all  weighfs  (ciijpj)  on  misclassificafion  oufcomes  wifhin  fhe  classes 
are  equal,  fhe  previous  sfeps  may  be  used,  or  more  efficienfly,  fhe  following  may  be  used.  For  fhis 
special  case,  fhe  fiducial  inferval  around  BC  reduces  fo  a  mulfiple  of  fhe  Clopper-Pearson  Cl  around 
a  binomial  probabilify  of  success  (where  a  success  is  defined  as  an  incorrecf  classificafion).  This 
is  demonsfrafed  WLOG  assuming  an  equal  weighf  of  one  for  all  misclassificafion  probabilifies. 

Firsf,  if  is  possible  fo  defermine  fhe  fofal  misclassificafion  probabilify  from  fhe  enfire  classificafion 

.  BC 

sysfem  as  p,„c  =  -  — 


^  =  "X  •  number  of  misclassificafions  from  fhe  classificafion 


sysfem  be  fhe  binomial  random  variable  X  -  Xic|i  -i-  •  •  •  -i-  Xj^\k  .  Then  fhe  binomial  probabilify 
for  fhe  fofal  misclassificafion  of  fhe  sysfem  is  esfimafed  by  ^  ’  which 

can  be  wriffen  in  ferms  of  BC  due  fo  fhe  equal  sample  size  and  weighfs  in  each  class  {n  -  nj). 
Therefore,  using  fhe  (1  -  a)  100%  Clopper-Pearson  fiducial  inferval  consfrucfed  around  p^c  such  fhaf 
Pmc  £  [Pmc,L, Pmc,u]  >  the  (1  -  a)100%  fiducial  inferval  around  BC  is  BC  €  [kx  Pmc,L,k  x  Pmc,u]  - 
[BCl,  BCu]  ■  From  fhis  resulf,  fhe  fiducial  inferval  for  BC  is  easily  compufed  as  a  mulfiple  of  fhe 
closed  form  solution  fo  fhe  Clopper-Pearson  Cl  as 


k  X 


1  -t 


N -x±\ 


1-1 


<  BC  <  kx 


1  -h 


N-x 


1-1 


(X  ±  l)F2{x+l),2(N-x),a/2 


(4.37) 


xF2x,2(N-x+l),\-al2  _ 

where  x  is  fhe  fofal  number  of  incorrecf  classifications  observed  for  fhe  enfire  sample  from  fhe 
classificafion  sysfem,  F  represenfs  fhe  F  disfribufion,  and  At  =  A:  x  n  [12]. 
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4.2.4  Equivalence  for  the  Youden  Index. 

The  fiducial  interval  around  BC  when  all  Ci\jPj  are  equal,  for  i  j  ,  can  be  used  equivalently 


for  any  /:-class  J.  Because  the  outcomes  from  each  class  are  modeled  as  binomial  random 
variables  in  this  framework,  let  the  correct  classifications  from  each  class  {Xj\j)  be  considered  a 
success  instead  of  the  misclassifications  Then,  the  correct  classification  probability  space 

(p  =  (pi|i, .  - . ,  Pk\k))  will  be  searched  for  the  upper  and  lower  bounds.  Now  \ctW  -  J  .  Then 


k  k  Y 

(=1  j=i  ■> 
‘=j 


(4.38) 


where  the  maximization  is  excluded  because  it  is  assumed  the  classifier  is  applied  at  its  optimal 
settings.  The  (1  -  a)  100%  upper  and  lower  fiducial  bounds  for  J  from  an  observed  statistic  y  are: 


' 

' 

Ji  -  sup ■ 

J  e  J'  such  that  1  -  min 

p:p^l=/ 

ZZ-^^(xip) 

t<w*  xeS 
w=t 

a 

<  -  > 
“  2 

' 

' 

J  e  J'  such  that  max 

p:p^l=7 

22/x(x|p) 

t<w  xeS 
w=t 

a 

<  -  > 
“  2 

(4.39) 


(4.40) 


where  J'  is  the  J  sample  space  such  that  J'  -  {J  :  J  =  p^l,p  =  {pi\\,  ■  ■  ■ , P\k), Pi\j  £  [0,1])  , 
S  is  the  joint  binomial  sample  space  which  is  the  set  of  1  x  k  sized  vectors  such  that  S  =  {x  - 
(xi|i, . . .  ,Xk\k)  '■  Xj\j  €  <  Uj)  ,  'W  -  [w  :  w  -  Yi  Z  €  S)  and  w*  is  the  ordered 

value  of  IT  €  'TT  directly  less  than  w.  When  w  -  0  or  w  -  k  ,  the  lower  bound  is  =  0  and  the 
upper  bound  is  Ju  =  k  ,  respectively. 


4.3  Bootstrap  Methods 

Bootstrap  methods  presented  in  Section  3.4  may  be  similarly  applied  here.  For  comparison  to 
the  newly  developed  nonparametric  Cl  around  BC,  the  BCa  bootstrap  Cl  will  be  used.  The  BCa 
bootstrap  Cl  is  a  practical  choice  because  this  Cl  method  is  appropriate  when  the  distribution  of  the 
parameter  is  skewed  [11].  Recall  that  since  BC  is  constructed  by  the  minimization  of  multinomial 
probabilities,  it  is  expected  that  this  distribution  may  be  skewed.  This  was  observed  in  the  results 
of  the  simulation  in  Section  3.5.  The  BCa  Cl  also  allows  the  skewness  of  the  distribution  to  change 
with  the  varying  parameter,  which  also  might  be  expected  for  BC  based  on  the  results  of  Section  3.5 
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(Figure  3.1)  [11].  Finally,  the  BCa  bootstrap  Cl  is  used  for  nonparametric  Cl  around  BC  because 
this  Cl  method  was  shown  to  perform  best  for  Cl  around  BC  in  Chapter  3  (with  BC  estimated 
parametrically)  and  for  Cl  around  J  in  [56]  (with  J  estimated  empirically). 


4.4  Simulation  Results 

A  simulation  study  was  conducted  to  demonstrate  the  performance  of  the  proposed  fiducial 
interval  around  BC.  This  method  is  ideal  for  small  sample  sizes,  and  therefore  the  simulations  are 
run  with  various  equal  and  unequal  small  sample  size  scenarios  for  both  the  two-  and  three-class 
BC.  For  clarity  in  this  section,  the  two-class  BC  is  denoted  BC2  and  the  three-class  BC  denoted 
BC3.  These  are  defined  as 

BC2  =  C2\\PlP2\\  +  C\\2P2PI\2  (4.41) 


and 


3  3 

5^3  -  2]  2]  ci\jPjPi\j 


(4.42) 


Mulfiple  values  of  BC2  and  BC3  are  considered  in  order  fo  demonsfrafe  performance  of  fhe  fiducial 
inferval  around  BC  under  differing  classification  sysfem  performance.  In  addition  fo  varying 
classificafion  performance  scenarios,  bofh  equal  and  unequal  weighfs  are  considered.  The  unequal 

weighfs  scenarios  ufilize  fhe  fwo  unequal  cosl  sfrucfures  from  fhe  simulafion  in  Chapfer  3.  Recall 

[012]  [025]  '^l|l  Cl|3 

fhaf  fhese  cosf  sfrucfures  are  Costi  =  101  and  Cost2  =  103  where  Cost  =  ^23  C212  C213  .  All 

[2IOJ  [l30j  C3|i  C3|2  C3|3 

prevalences  are  assumed  equal  {pj  =  j). 

Two  disfribufional  scenarios  are  considered.  Firsf,  no  disfribufional  assumpfions  abouf  fhe 
classificafion  sysfem  are  made.  Then,  comparisons  are  made  againsf  ofher  Cl  fechniques  when  fhe 
single  fealure  used  for  classificafion  is  independenfly  and  normally  disfribufed  for  each  class.  Each 
disfribufional  scenario  is  discussed  separafely.  Absolufe  bias  of  BC  is  also  presenfed.  All  simulation 
scenarios  used  3000  simulation  runs  in  R  and  a  =  0.05  [52]. 

4.4.1  Equal  Costs. 

A  cost  structure  is  assumed  where  ctypj  =  1  ,  for  /  3^  7  .  Under  this  framework,  BC2  €  [0,2] 
and  RC3  €  [0,3]  ,  where  BC2  -  1  and  BC3  =  1.5  reflect  chance  classification.  The  values  of 
BC  chosen  to  reflect  a  range  of  classification  accuracy  are  BC2  =  (0.6, 0.4, 0.2, 0.1)  and  BC3  = 
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(0.9,0.6,0.3,0.15),  such  that  each  BC3  value  has  the  same  average  misclassification  probability  as 
a  corresponding  BC2  value. 

4.4.1. 1  No  Distributional  Assumptions  on  the  System. 

Making  no  assumptions  about  the  classification  system’s  structure,  multinomial  random 
variables  are  randomly  generated  representing  outcomes  from  a  classification  system’s  resulting 
contingency  table  (recall  Tables  2.3  and  2.4).  The  misclassification  probabilities  are  assumed  to 
be  equally  distributed  between  all  classes  for  each  BC2  or  BC3  value.  The  fiducial  interval  is 
constructed  around  BC  separately  for  all  3000  simulation  runs  and  the  coverage  probability  and 
average  length  of  the  intervals  calculated.  Absolute  bias  of  the  estimated  BC  is  also  calculated. 

The  results  are  presented  in  Table  4.4.  For  all  sample  size  and  BC  scenarios,  the  intervals 
perform  well  with  coverage  probabilities  of  at  least  95%.  Also,  the  average  length  of  the  interval 
decreases  as  the  total  sample  size  increases  and  as  the  classification  performance  improves  (smaller 
BC).  The  absolute  bias  in  the  empirically  estimated  BC  is  higher  for  larger  BC  values  and  decreases 
as  nj  increases,  mimicking  the  trend  of  interval  length.  Absolute  bias  is  higher  for  BC3  (absolute 
bias  e  [0.056, 0.292])  than  for  BC2  (absolute  bias  €  [0.044, 0.225])  for  equivalent  nj  . 

4.4.1.2  Normally  Distributed  Feature. 

To  compare  the  performance  of  the  proposed  fiducial  interval  to  other  available  Cl  methods  for 
BC,  a  classification  system  with  a  single  feature  that  is  independently  and  normally  distributed  for 
each  class  and  a  single  threshold  between  each  class  (two  thresholds  for  BC3)  is  assumed.  For  all 
scenarios,  the  variance  for  each  class  is  assumed  equal  to  one  and  the  means  are  varied  to  achieve 
the  desired  BC2  or  BC3  value.  These  normal  distribution  parameters  are  listed  in  Table  4.5.  The 
sample  sizes  considered  are  held  consistent  with  those  in  Section  4.4. 1.1. 

For  both  the  two-  and  three-class  scenarios,  three  methods  in  addition  to  the  fiducial  interval 
are  compared.  The  first  method  is  a  nonparametric  BCa  bootstrap  Cl.  In  [56],  the  BCa  bootstrap  Cl 
is  shown  to  have  good  coverage  around  J2  for  nj  >  50  when  J2  is  estimated  empirically.  However, 
in  [36],  the  BCa  bootstrap  is  shown  to  perform  well  for  slightly  smaller  sample  sizes  when  Jj,  is 
estimated  parametrically  (defining  7  as  a  function  of  the  normal  distribution  parameters  from  the 
features).  Recall,  when  all  Ci\jPj  are  equal,  for  i  j  ,  J  and  BC  may  be  used  equivalently  where 


73 


Table  4.4:  Simulation  coverage  probability  and  length  for  95%  fiducial  intervals  around  BC  for  two 
and  three  classes  when  all  Ci\jPj  are  equal,  for  it  j  ,  making  no  distributional  assumptions  on  the 
classification  system 


BC2  = 

0.6 

0.4 

0.2 

0.1 

#  of  Classes 

n\ 

«2 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

k  =  2 

5 

5 

99.03 

1.12 

99.57 

1.01 

99.00 

0.85 

98.70 

0.74 

6 

9 

96.00 

0.92 

97.47 

0.82 

98.93 

0.68 

98.27 

0.58 

10 

10 

97.53 

0.83 

98.10 

0.73 

99.00 

0.58 

98.50 

0.48 

12 

18 

95.60 

0.67 

96.00 

0.60 

98.77 

0.47 

99.33 

0.38 

20 

20 

96.37 

0.59 

97.27 

0.52 

97.03 

0.41 

98.70 

0.31 

22 

28 

95.47 

0.51 

95.90 

0.46 

96.33 

0.35 

98.23 

0.27 

30 

30 

96.70 

0.48 

96.50 

0.43 

97.27 

0.33 

99.10 

0.25 

BC3  = 

0.9 

0.6 

0.3 

0.15 

#  of  Classes 

n\ 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

k  =  3 

5 

5 

5 

97.80 

1.41 

98.00 

1.26 

98.73 

1.02 

99.40 

0.85 

4 

6 

10 

96.53 

1.25 

98.27 

1.12 

98.27 

0.91 

98.97 

0.78 

10 

10 

10 

96.97 

1.02 

97.93 

0.91 

99.17 

0.71 

98.53 

0.56 

8 

12 

20 

95.67 

0.91 

96.77 

0.82 

98.63 

0.64 

99.00 

0.52 

20 

20 

20 

96.60 

0.72 

96.27 

0.64 

97.03 

0.49 

98.97 

0.37 

24 

16 

30 

95.17 

0.67 

95.37 

0.59 

98.03 

0.47 

99.23 

0.36 

30 

30 

30 

96.17 

0.59 

96.47 

0.52 

96.13 

0.40 

97.53 

0.30 

Cov  -  coverage  probability;  Len  -  length 


BC2  =  I  -  J2  and  BC3  =  3-/3  (where  J2  =  p\\i  +  P2\2  -  1  and  Jt,  -  pi\i  +  p2\2  +  P3|3)-  Therefore, 
two  BCa  bootstrap  CIs  are  constructed  around  both  BC2  and  BC3,  one  utilizing  an  empirical  and 
the  other  a  parametric  estimation  of  BC  as  described  in  [56]  and  [36]  (denoted  BCa^  and  BCap, 
respectively).  For  both  BCa  CIs,  999  nonparametric  bootstrap  samples  are  used.  In  addition,  the 
delta  method  Cl  (see  Section  3.2)  and  the  GCI  (see  Section  3.3)  are  also  used  for  comparison  to 
the  fiducial  inferval.  For  fhe  implemenfafion  of  fhese  CIs,  fhe  classifier  is  applied  fo  fhe  random 
samples  from  fhe  normal  disfribufions  fo  consfrucf  fhe  resulfing  confingency  fable  (in  fhe  spirif  of 
Table  2.4),  and  fhen  fhe  appropriafe  Cl  mefhod  is  applied. 

One  addifional  mefhod  is  also  considered  for  comparisons  of  CIs  around  BC2.  Because  fhis 
mefhod  was  developed  for  fhe  fwo-class  framework  only,  if  is  nol  used  in  fhe  simulafion  for  BC3. 
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Table  4.5:  Normal  distribution  parameters  used  in  fiducial  interval  simulation  with  each  cr,  =  1 


#  of  Classes 

BC 

b2 

k  =  2 

0.6 

0 

1.049 

- 

0.4 

0 

1.683 

- 

0.2 

0 

2.563 

- 

0.1 

0 

3.290 

- 

k  =  3 

0.9 

-1.0 

0 

2.148 

0.6 

-1.5 

0 

2.902 

0.3 

-2.5 

0 

3.405 

0.15 

-3.6 

0 

3.523 

This  final  mefhod  is  a  nonparamefric  mefhod  which  assumes  fhere  is  a  single  fhreshold  befween 
fhe  fwo  classes,  buf  makes  no  assumptions  about  the  distribution  of  the  feature.  It  is  based  on  the 
Agrestti  Coull  Cl  for  a  binomial  proportion  and  utilizes  a  bootstrap  to  determine  the  Cl  bounds 
(denoted  NP)  [79].  This  method  uses  an  estimation  of  J  (easily  modified  for  BC)  given  in  Equation 
2.33.  Once  again,  since  all  weighfs  are  fixed  fo  be  equal,  this  Cl  method  may  be  used  equivalently  for 
BC2-  The  coverage  and  length  of  all  CIs  around  BC2  and  BC3  is  determined  by  the  3000  simulation 
runs  for  the  normally  distributed  feature.  All  simulations  are  run  in  R  and  the  boot  package  is  used 
for  all  bootstrapped  CIs  [10,  15,  52]. 

The  results  are  presented  in  Table  4.6  for  two  classes  and  Table  4.7  for  three  classes  (due  to 
the  poor  performance  of  the  NP  Cl,  these  results  are  in  the  Appendix,  Section  B.2).  The  proposed 
fiducial  mefhod  meefs  or  exceeds  fhe  desired  coverage  probabilify  of  95%  for  all  sample  size  and 
BC  values  considered.  Also,  similar  fo  the  simulation  scenario  which  made  no  assumptions  about 
the  underlying  distributions,  as  the  total  sample  size  increases  and  BC  value  decreases,  the  length 
of  the  fiducial  inferval  decreases.  Since  lower  BC  values  indicafe  a  more  accurafe  classification 
sysfem,  fhe  proposed  Cl  will  perform  besf  (when  also  considering  lengfh)  for  accurate  sysfems. 
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Table  4.6:  Simulation  coverage  probability  and  length  for  multiple  methods’  95%  Cl  around  BC2 


for  two  classes  with  a  normally  distributed  feature  when  all  Ci\jPj  are  equal,  for  i  j  . 


Fiducial 

Delta 

BCap 

BCa£ 

GCI 

ni 

n2 

BC2 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

5 

5 

0.6 

100.0 

1.10 

86.07 

0.80 

86.10 

0.67 

71.90 

1.32 

96.93 

0.77 

0.4 

99.87 

0.98 

85.17 

0.72 

83.67 

0.62 

80.97 

1.42 

97.53 

0.75 

0.2 

98.90 

0.80 

82.00 

0.53 

78.93 

0.43 

64.30 

1.16 

98.37 

0.62 

0.1 

98.80 

0.70 

78.37 

0.37 

75.77 

0.28 

21.80 

0.73 

98.60 

0.50 

6 

9 

0.6 

96.57 

0.91 

90.03 

0.71 

91.07 

0.63 

88.30 

0.91 

96.53 

0.67 

0.4 

98.80 

0.81 

88.80 

0.63 

89.23 

0.59 

88.07 

0.93 

97.07 

0.64 

0.2 

99.27 

0.65 

85.70 

0.46 

85.47 

0.42 

77.40 

0.73 

97.80 

0.51 

0.1 

99.00 

0.56 

82.47 

0.31 

83.33 

0.28 

41.17 

0.46 

98.00 

0.39 

10 

10 

0.6 

98.60 

0.82 

91.73 

0.61 

93.33 

0.57 

84.26 

0.76 

96.06 

0.59 

0.4 

97.03 

0.72 

90.33 

0.54 

92.23 

0.53 

90.20 

0.76 

96.10 

0.55 

0.2 

98.97 

0.57 

87.50 

0.39 

89.60 

0.38 

88.07 

0.57 

96.60 

0.42 

0.1 

98.80 

0.46 

85.97 

0.26 

87.97 

0.26 

37.20 

0.34 

96.63 

0.31 

12 

18 

0.6 

96.83 

0.67 

92.63 

0.52 

93.63 

0.50 

92.33 

0.67 

95.40 

0.51 

0.4 

96.10 

0.59 

91.57 

0.46 

93.17 

0.46 

94.07 

0.64 

95.37 

0.46 

0.2 

99.10 

0.46 

89.40 

0.33 

91.93 

0.33 

91.87 

0.52 

96.00 

0.35 

0.1 

99.60 

0.36 

87.37 

0.22 

90.97 

0.23 

84.03 

0.34 

96.13 

0.25 

20 

20 

0.6 

95.63 

0.59 

92.13 

0.45 

93.47 

0.44 

89.63 

0.59 

94.33 

0.44 

0.4 

97.47 

0.52 

91.67 

0.39 

93.27 

0.39 

94.33 

0.55 

94.57 

0.39 

0.2 

96.17 

0.40 

90.37 

0.29 

92.33 

0.29 

94.07 

0.44 

94.63 

0.29 

0.1 

99.00 

0.30 

88.97 

0.19 

91.73 

0.20 

78.10 

0.31 

94.76 

0.21 

22 

28 

0.6 

95.87 

0.51 

92.90 

0.40 

93.80 

0.40 

93.27 

0.53 

94.80 

0.40 

0.4 

96.00 

0.45 

92.30 

0.36 

93.60 

0.36 

95.20 

0.48 

95.07 

0.36 

0.2 

95.60 

0.35 

91.20 

0.26 

93.23 

0.26 

94.43 

0.39 

95.43 

0.26 

0.1 

98.57 

0.26 

89.23 

0.17 

92.77 

0.18 

91.37 

0.29 

95.63 

0.18 

30 

30 

0.6 

97.13 

0.48 

94.37 

0.37 

94.93 

0.37 

90.57 

0.49 

95.70 

0.37 

0.4 

97.00 

0.43 

93.93 

0.32 

94.50 

0.33 

94.83 

0.44 

95.80 

0.32 

0.2 

97.37 

0.33 

93.33 

0.24 

93.47 

0.24 

95.93 

0.36 

95.47 

0.24 

0.1 

99.27 

0.24 

91.70 

0.16 

93.12 

0.16 

84.87 

0.27 

95.63 

0.17 

Cov  -  coverage  probability;  Len  -  length;  BCap  -  bias  corrected  and  accelerated/parametric  estimate 
BCa£  -  bias  corrected  and  accelerated/empirical  estimate;  GCI  -  generalized  confidence  interval 
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Table  4.7:  Simulation  coverage  probability  and  length  for  multiple  methods’  95%  Cl  around  BC3 
for  three  classes  with  a  normally  distributed  feature  when  all  Ci\jPj  are  equal,  for  i  j  . 


Fiducial  Delta  BCap  BCa^  GCI 


ni 

«2 

«3 

BC^ 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

5 

5 

5 

2.1 

98.50 

1.37 

89.97 

0.91 

80.80 

0.69 

74.03 

1.00 

97.53 

0.89 

2.4 

99.43 

1.21 

88.33 

0.85 

82.23 

0.66 

84.53 

0.93 

96.80 

0.87 

2.7 

99.27 

0.94 

86.37 

0.67 

79.17 

0.49 

62.80 

0.66 

95.00 

0.80 

2.85 

99.63 

0.78 

84.63 

0.49 

77.03 

0.34 

29.7 

0.35 

93.77 

0.69 

4 

6 

10 

2.1 

96.80 

1.21 

87.33 

0.87 

78.73 

0.66 

82.47 

1.07 

97.40 

0.85 

2.4 

99.30 

1.07 

87.33 

0.82 

80.87 

0.63 

84.97 

0.99 

97.70 

0.84 

2.7 

98.77 

0.86 

84.70 

0.64 

78.67 

0.47 

74.50 

0.70 

97.30 

0.75 

2.85 

99.40 

0.74 

84.23 

0.44 

79.67 

0.33 

53.27 

0.40 

95.57 

0.63 

10 

10 

10 

2.1 

98.60 

1.01 

92.47 

0.68 

89.73 

0.60 

89.87 

0.92 

96.33 

0.66 

2.4 

98.23 

0.89 

92.33 

0.63 

91.00 

0.57 

92.60 

0.86 

96.30 

0.63 

2.7 

99.40 

0.68 

91.33 

0.49 

98.97 

0.44 

90.97 

0.68 

95.97 

0.53 

2.85 

98.70 

0.52 

89.77 

0.35 

89.47 

0.32 

74.53 

0.43 

95.33 

0.42 

8 

12 

20 

2.1 

96.63 

0.90 

91.27 

0.64 

87.97 

0.57 

89.20 

0.88 

96.13 

0.62 

2.4 

95.20 

0.80 

91.23 

0.61 

89.57 

0.55 

91.80 

0.84 

96.47 

0.61 

2.7 

98.83 

0.63 

90.00 

0.47 

89.03 

0.42 

90.73 

0.67 

96.60 

0.51 

2.85 

99.47 

0.50 

89.50 

0.32 

88.73 

0.29 

86.67 

0.43 

95.03 

0.38 

20 

20 

20 

2.1 

97.80 

0.72 

93.43 

0.49 

92.20 

0.46 

92.33 

0.69 

95.57 

0.48 

2.4 

97.13 

0.63 

92.80 

0.45 

92.93 

0.43 

93.93 

0.64 

95.57 

0.45 

2.7 

96.80 

0.48 

92.03 

0.35 

92.03 

0.34 

94.50 

0.52 

95.03 

0.37 

2.85 

99.20 

0.36 

91.10 

0.24 

91.50 

0.24 

92.73 

0.38 

94.03 

0.27 

24 

16 

30 

2.1 

95.53 

0.66 

92.83 

0.47 

91.17 

0.44 

92.77 

0.68 

95.93 

0.45 

2.4 

95.63 

0.59 

92.87 

0.45 

91.87 

0.43 

94.93 

0.63 

95.10 

0.45 

2.7 

97.27 

0.46 

92.33 

0.35 

91.10 

0.33 

94.43 

0.51 

94.60 

0.37 

2.85 

99.33 

0.35 

90.93 

0.25 

91.13 

0.23 

91.63 

0.38 

93.57 

0.28 

30 

30 

30 

2.1 

97.37 

0.59 

94.33 

0.40 

92.67 

0.39 

94.93 

0.57 

94.57 

0.40 

2.4 

97.43 

0.52 

93.87 

0.37 

93.47 

0.36 

94.7 

0.52 

94.67 

0.37 

2.7 

97.00 

0.40 

93.73 

0.29 

93.13 

0.28 

95.00 

0.42 

94.53 

0.30 

2.85 

97.77 

0.29 

93.33 

0.20 

93.27 

0.20 

90.97 

0.32 

94.33 

0.22 

Cov  -  coverage  probability;  Len  -  length;  BCap  -  bias  corrected  and  accelerated/parametric  estimate 
BCap  -  bias  corrected  and  accelerated/empirical  estimate;  GCI  -  generalized  confidence  interval 
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The  only  other  Cl  that  approaches  the  desired  coverage  probability  is  the  GCI.  The  GCI  has 
lengths  that  are  on  average  25%  shorter  than  the  fiducial  intervals.  However,  the  GCI  is  only 
appropriate  for  a  classification  system  with  a  single  feature  that  is  independently  and  normally 
distributed  for  each  class.  The  GCI  always  outperforms  the  delta  method  Cl  in  coverage,  which  is 
also  constructed  on  the  assumption  of  a  normally  distributed  feature  (this  was  already  observed  in 
Section  3.5  for  the  small  sample  size  scenario). 

The  NP  Cl  performs  poorly  with  respect  to  coverage  for  highly  accurate  classifiers  {BC  =  0.1) 
and  for  all  BC  values  for  nj  <  20  (see  Appendix  B.2).  Therefore,  fhis  Cl  is  nol  appropriate 
for  a  nonparamefric  small  sample  Cl  around  BC.  Bofh  boolsfrap  CIs  (wifh  BC  esfimafed  eifher 
paramefrically  or  empirically)  perform  poorly  for  small  sample  size  scenarios  (wifh  fhe  BCap  Cl 
oulperforming  fhe  BCa^  Cl).  In  general,  fhe  boolsfrap  BCa  Cl  wifh  a  paramelric  esfimale  of  BC 
performs  very  similar  fo  fhe  della  melhod  Cl  in  bofh  lenglh  and  coverage,  as  is  also  seen  in  Section 
3.5.  The  BCa£  Cl  performs  fairly  well  in  coverage  for  rij  >  20  ,  alfhough  fhe  coverage  drops  for 
BC  -  0.1.  Also,  as  fhe  coverage  of  fhe  BCa^  Cl  gels  close  lo  fhe  desired  level  (w  90  -  95%), 
fhis  CFs  lenglh  becomes  very  similar  lo,  and  usually  slighfly  worse  lhan,  fhe  length  of  the  fiducial 
inlerval.  This  suggesls  lhal  for  a  nonparamefric  melhod  lhat  meefs  fhe  desired  coverage,  if  may  nol 
be  possible  fo  achieve  shorter  lenglhs  lhan  lhal  of  fhe  fiducial  inlerval. 

The  paramelric  esfimale  of  BC  (used  for  fhe  della,  generalized,  and  BCap  CIs)  has  fhe  lowesl 
absolute  bias  (absolute  bias  €  [0.001,0.209]),  which  is  expected  because  fhis  estimate  is  based  on 
fhe  assumplions  used  for  fhe  simulation.  The  empirically  esfimafed  BC  (used  for  fhe  BCa^  and 
fiducial  intervals)  has  larger  bias  (absolute  bias  e  [0.007,0.278])  lhan  fhe  paramelric  estimates  bul 
similar  bias  as  seen  in  fhe  simulafion  lhat  used  multinomial  random  variables.  Finally,  the  bias 
for  the  estimate  of  BC  used  for  the  NP  Cl  increases  significantly  as  BC2  decreases  (absolute  bias 
e  [0.066, 0.356]).  This  trend  in  bias  was  also  noted  in  [79]  for  J.  This  increase  in  bias  for  decreased 
BC2  may  contribute  to  the  decreased  coverage  for  this  method  at  lower  BC  values. 

4.4.2  Unequal  Costs. 

To  ensure  performance  of  the  fiducial  Cl  is  nol  degraded  when  fhe  cosls  of  misclassification  are 
nol  equal,  Iwo  addifional  cosl  scenarios  (fhe  same  cosl  slruclures  considered  for  fhe  paramelric  CIs 
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in  Section  3.5)  are  considered  for  the  three-class  BC.  In  Section  3.5,  the  different  cost  structures  do 
not  have  an  impact  on  the  Cl  performance  for  a  normally  distributed  feature.  For  this  reason  the  cost 
structure  performance  is  demonstrated  with  the  multinomial  distribution  only,  as  the  performance 
with  a  normally  distributed  feature  is  not  expected  to  differ  from  what  is  presented  in  Tables  4.6  and 
4.7. 

Additionally,  only  sample  sizes  up  to  n,  =  20  are  considered  as  a  result  of  intensive  com¬ 
putational  time  when  using  the  multinomial  distributions.  The  same  average  total  misclassification 
probabilities  considered  for  the  equal  cost  scenarios  are  used  for  this  simulation,  with  the  error  prob¬ 
abilities  being  evenly  distributed  throughout  the  classes.  This  results  in  different  BC3  values,  where 
BC^^Costi  =  (0.4, 0.27,0.13,0.07)  and  BC3,corf2  “  (0.75,0.5,0.25,0.125).  The  coverage  probabil¬ 
ity  and  length  of  the  CIs  are  presented  in  Table  4.8.  As  expected,  the  Cl  is  achieving  a  coverage 
probability  of  at  least  I  -  a.  Notably,  the  Cl  for  the  unequal  cost  scenarios  are  more  conservative 
with  respect  to  coverage  than  the  equal  cost  scenarios  due  to  the  step  function  required  for  find¬ 
ing  the  bounds.  Finally,  bias  of  the  estimated  BC  is  similar  to  the  previous  sections  (absolute  bias 
€  [0.026,0.137]  for  Costi  and  absolute  bias  €  [0.053,0.274]  for  Cost2)- 

4.5  Comparisons  to  Multinomial  Methods 

One  simple  solution  for  a  Cl  around  BC  is  the  construction  of  simultaneous  CIs  around  the 
multinomial  probabilities  resulting  from  the  classification  system,  and  then  summing  these  bounds 
to  calculate  upper  and  lower  bounds  around  BC\ 

k  k 

BCl  =  zz  Pi\j,L  (4.43) 

(=1. 

‘*j 

k  k 

BCu  =  zz  Pi\j,u  (4.44) 

where  [pi\j,L,  Pi\j,u^  is  the  (1  -  a)100%  Cl  around  pi\j  found  using  a  simultaneous  Cl  method  for  the 
class’  multinomial  probabilities.  With  k  classes,  k  sets  of  simultaneous  CIs  will  be  needed  which 
may  require  an  adjustment  for  multiple  comparisons  to  construct  the  (1  -  a)  100%  Cl  around  BC. 
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Table  4.8:  Simulation  coverage  probability  for  95%  fiducial  intervals  around  BC  for  three  classes 
and  two  different  cost  structures  making  no  assumptions  on  the  classification  system. 


BCs  = 

0.4 

0.27 

0.13 

0.07 

Cost\ 

n\ 

«2 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

5 

5 

5 

99.27 

0.76 

99.63 

0.67 

98.93 

0.55 

98.43 

0.48 

4 

6 

10 

98.93 

0.70 

99.40 

0.61 

99.63 

0.50 

99.50 

0.42 

10 

10 

10 

99.13 

0.55 

98.60 

0.48 

99.17 

0.38 

98.67 

0.30 

8 

12 

20 

98.50 

0.52 

98.93 

0.46 

99.13 

0.37 

99.53 

0.30 

20 

20 

20 

98.90 

0.40 

98.63 

0.35 

99.03 

0.27 

99.13 

0.20 

BC3  = 

0.75 

0.5 

0.25 

0.125 

Costj 

n\ 

«3 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

5 

5 

5 

99.13 

1.42 

99.13 

1.26 

98.30 

1.06 

99.47 

0.96 

4 

6 

10 

98.53 

1.12 

98.20 

0.98 

97.90 

0.80 

98.50 

0.67 

10 

10 

10 

98.30 

1.06 

98.70 

0.92 

98.83 

0.72 

98.67 

0.60 

8 

12 

20 

97.87 

0.82 

97.87 

0.71 

98.13 

0.55 

98.27 

0.43 

20 

20 

20 

97.87 

0.77 

98.07 

0.67 

98.87 

0.51 

99.20 

0.39 

Cov  -  Coverage  probability;  Len  -  Length 


The  simultaneous  Cl  methods  for  multinomial  probabilities  listed  in  Section  2.7.2. 1  that  may 
be  used  for  producing  a  Cl  around  BC  are  considered  in  this  section.  In  [71]  the  performance  of 
these  methods  is  evaluated  with  respect  to  coverage  probability.  The  Gold  (1963)  and  Goodman 
(1965)  methods  have  a  minimum  possible  coverage  probability  of  zero,  which  is  not  desirable  [71]. 
The  Queensberry  and  Hurst,  Fitzpatrick  and  Scott,  and  Sison  and  Glaz  methods  all  have  minimum 
coverage  probabilities  greater  than  zero,  although  notably  not  greater  than  1  -  a  [71].  The  three 
methods  whose  minimum  coverage  probability  is  greater  than  zero  are  considered  for  constructing 
a  Cl  around  BC. 

The  final  mefhod  considered  (alfhough  if  is  nof  a  simulfaneous  Cl  for  mulfinomial  proportions) 
is  fhe  Clopper-Pearson  Cl  for  a  binomial  proporfion  (presented  in  Secfion  2.7. 1.1).  Under  fhe 
assumpfion  fhaf  all  Ci\jPj  are  equal,  for  i  j  ,  fhe  fofal  misclassificafion  probabilify  from  each  class 
may  be  modeled  as  a  binomial  proporfion  and  fherefore  fhe  Clopper-Pearson  Cl  can  be  utilized. 
The  Clopper-Pearson  Cl  for  binomial  proportions  has  a  minimum  coverage  probabilify  of  af  leasf 
1  -  a  [3]. 
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4.5.1  Simulation  Results. 


A  simulation  was  conducted  to  compare  the  performance  of  the  Clopper-Pearson,  Fitzpatrck 
and  Scott,  Queensberry  and  Hurst,  Sison  and  Glaz,  Wald,  and  Log  Wald  methods  when  used  to 
construct  CIs  around  BC  (Wald  and  Log  Wald  intervals  are  developed  for  BC  in  the  Appendix, 
Section  A.4).  A  three-class  scenario  with  equal  misclassifi cation  weights  is  assumed  (ciypj  =  1  , 
i  +  j),  and  nj  =  5,  10,  and  30  is  considered.  The  coverage  probability  and  length  of  the  intervals 
over  all  values  of  BC  (in  increments  of  0.01,  allowing  misclassification  probabilities  to  be  randomly 
assigned  within  all  classes  for  each  sample  and  fixed  BC  value)  are  determined  using  10,000 
simulation  runs.  Although  some  of  the  methods  considered  in  this  section  require  the  construction 
of  k  sets  of  simultaneous  CIs,  an  adjustment  for  multiple  comparisons  (such  as  the  Bonferroni 
adjustment  to  a)  is  not  made  since  these  methods’  resulting  CIs  around  BC  without  an  adjustment 
all  have  coverage  above  1  -  a.  A  Bonferroni  adjustment  would  only  increase  the  coverage  and 
length  of  the  interval,  which  is  not  desired  for  comparison.  The  results  are  presented  in  Figure  4.3®. 

The  simultaneous  Cl  methods  do  not  perform  well  with  respect  to  Cl  length  (although  coverage 
is  met)  and  generally  are  so  wide  that  the  Cl  would  be  useless.  Also,  as  expected  due  to  the  poor 
performance  of  the  Wald  Cl  on  binomial  proportions,  the  Wald  and  Log  Wald  methods  do  not  meet 
the  desired  coverage,  although  they  have  shorter  lengths.  The  performance  of  the  fiducial  interval 
is  presented  in  Figure  4.3  for  k  =  3  and  rij  =  5,10,  and  30  with  the  red  line.  Notably,  the  fiducial 
method  outperforms  the  simultaneous  Cl  methods  as  it  exceeds  the  desired  coverage  with  much 
shorter  lengths. 


^Discontinuities  in  the  plots  at  BC  =1.0  and  BC  =  2.0  occur  due  to  a  change  in  how  the  probabilities  were  randomly 
assigned,  which  was  necessary  to  ensure  the  BC  values  reached  the  desired  levels. 
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Figure  4.3:  Coverage  probability  (left)  and  length  (right)  of  Cl  methods  for  95%  CIs  using  Wald 
(blue),  Log  Wald  (light  blue),  Sison  and  Glaz  (green),  Fitzpatrick  and  Scott  (purple),  Queensberry 
and  Hurst  (pink),  Clopper  and  Pearson  (dark  green),  and  the  fiducial  interval  developed  in  Section 
4.2  (red).  §2 


4.6  Summary 

Although  Fisher’s  fiducial  argument  was  lively  debated,  and  deemed  ’’Fisher’s  biggest 
blunder”  by  Efron,  the  objections  to  the  theory  were  philosophical  and  not  based  on  the  method’s 
feasibility  [18,  31, 77].  In  fact,  in  this  chapter,  the  fiducial  interval  was  shown  to  be  a  very  useful  and 
well  performing  tool  for  a  Cl  around  BC.  The  fiducial  interval  proposed  in  this  chapter  consistently 
meets  the  desired  coverage  probability  for  various  classification  scenarios.  Although  the  Cl  has 
longer  length  than  other  intervals,  when  a  Cl  under  similar  frameworks  (empirically  estimated 
BCa  bootstrap  Cl)  comes  close  to  the  desired  coverage,  the  length  of  the  other  Cl  is  similar  and 
sometimes  worse  than  that  of  the  fiducial  interval.  The  fiducial  interval  was  shown  to  outperform 
the  Wald,  log  Wald,  and  all  simultaneous  Cl  methods  for  multinomial  probabilities  considered  with 
respect  to  coverage  probability  and  length. 

The  fiducial  interval  performs  well  under  any  distributional  scenario,  as  demonstrated  in  the 
simulation  section  using  classification  systems  with  either  no  underlying  distributions  or  those  with 
a  single  normally  distributed  feature.  When  the  feature  is  normally  distributed,  the  GCI  presented 
in  Section  3.3  outperforms  all  other  methods  considered  in  length,  when  the  coverage  was  met. 
Coverage  was  met  with  both  the  GCI  and  fiducial  methods,  although  the  estimates  of  coverage 
were  slightly  lower  with  the  GCI.  The  simulation  suggests  that  the  GCI  performance  may  drop 
as  class  size  and  classification  accuracy  increases,  in  the  three-class  scenario.  However,  under  the 
scenarios  considered  in  the  simulation  in  Section  4.4. 1.2  for  a  normally  distributed  feature,  the  GCI 
is  recommended.  The  utility  of  this  Cl  is  limited,  however,  as  it  is  only  appropriate  for  classification 
systems  known  to  have  thresholds  between  a  feature’s  normal  distributions  for  each  class. 

The  fiducial  interval  has  been  developed  to  assure  coverage  is  met.  As  such,  the  interval 
exceeds  the  coverage,  resulting  in  interval  lengths  which  may  be  seen  as  impractical.  This  is 
especially  true  for  very  small  samples  in  the  simulation.  However,  the  fiducial  interval  is  the  only 
method  which  will  guarantee  coverage  for  any  nonparametric  scenario  and  sample  size,  and  still 
may  provide  useful  information.  For  instance,  in  the  simulation  where  each  class  only  has  a  sample 
of  size  five,  coverage  is  about  99%  for  all  BC  values  considered.  For  the  high  BC  values,  the  lengths 
cover  more  than  half  of  the  possible  range  of  BC.  Yet  when  the  BC  value  is  low,  the  lengths  of  the 
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fiducial  intervals  are  shorter  to  the  extent  that  a  classification  system  performing  better  than  chance 
would  still  be  determined.  Therefore,  even  in  small  sample  scenarios  accurate  systems  may  be 
detected,  suggesting  usefulness  in  this  method  for,  say,  pilot  studies  of  potential  classifiers. 

The  fiducial  method  requires  searching  the  parameter  space  incremented  by  a  predetermined 
tolerance.  Given  the  step  functions  required  for  finding  the  bounds  for  BC  when  costs  on 
misclassifications  are  unequal,  this  tolerance  should  be  chosen  carefully.  If  the  space  is  searched 
too  coarsely,  the  upper  or  lower  bound  may  be  found  to  be  too  small  or  large,  respectively.  This 
is  demonstrated  for  a  three-class  example  where  the  second  cost  structure  {Cost2)  is  used.  The  top 
plot  in  Figure  4.4  is  the  minimum  coverage  at  all  BC  values  when  the  solution  to  the  bounds  was 
found  by  searching  the  parameter  space,  incremented  by  0.05.  The  bottom  plot  in  Figure  4.4  is  the 
minimum  coverage  at  all  BC  values  when  the  parameter  space  searched  was  incremented  by  0.01.  It 
is  clear  that  for  a  specific  scenario,  the  minimum  coverage  achieved  was  below  the  desired  level  of 
95%  when  the  space  was  searched  too  coarsely.  This  minimum  coverage  is  improved  however,  for 
the  more  finely  searched  interval.  Therefore,  although  the  developed  fiducial  interval  theoretically 
guarantees  a  coverage  of  (1  -  a)  100%,  the  increment  used  for  searching  the  parameter  space  must 
be  chosen  carefully  for  the  practical  implementation  of  the  interval  when  costs  are  unequal. 


0  2  4  6  8 

BC 


Figure  4.4:  Minimum  coverage  of  fiducial  intervals  when  searching  with  coarser  and  finer 
increments  in  probability  space,  5  =  0.05  (Top)  and  6  -  0.01  (Bottom) 
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This  chapter  provides  a  nonparametric  Cl  for  any  ^-class  BC  which  does  not  rely  on 
information  about  the  classification  system  used  to  construct  the  interval.  This  Cl  may  be  applied 
once  the  optimal  thresholds  have  been  selected,  has  the  advantage  of  working  for  any  classifier 
and  regardless  of  scenario,  and  achieves  fhe  desired  coverage  probability.  Therefore,  in  situations 
with  small  sample  sizes  or  where  the  underlying  distributions  of  the  feature  for  each  class  are  not 
normal  or  unknown,  this  fiducial  interval  provides  a  very  useful  and  flexible  fool  for  quantifying  the 
uncertainty  in  BC.  Finally,  although  this  method  is  developed  for  applications  with  BC,  it  may  be 
used  for  constructing  a  Cl  around  any  linear  combination  of  multinomial  or  binomial  probabilities. 
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V.  Parametric  Hypothesis  Tests 


5.1  Introduction 

The  methods  proposed  in  this  chapter  assume  a  classification  system  with  a  single  feature  that 
is  independently  and  normally  distributed  for  each  class  and  a  threshold  between  ordered  classes  to 
distinguish  any  k  number  of  classes.  Under  this  framework,  recall  that  BC  can  be  written  with  the 
normal  CDF,  where  the  minimization  is  left  off  because  fhis  is  achieved  by  using  fhe  k  -  \  opfimal 
fhresholds  (0^  ,m  =  .  ,k  - 


k-i  k 

(=2  7=1 


d) 


■Pi 


o-j 


-O 


m=i-l 


■Pj 


CTi 


k-\ 

7=1 


Pi  ^k-\ 


CTi 


(3.9) 


The  developmenf  of  fwo  differenf  fypes  of  hypothesis  tests  is  considered.  First,  for  a  single 
classification  system,  it  may  be  of  interest  to  test  a  one  sided  hypothesis  on  BC  in  order  to  determine 
if  the  system  performs  at  least  as  well  as  some  pre-specified  classification  accuracy  level  (measured 
by  BC).  For  instance,  one  may  be  interested  in  determining  if  a  system  performs  better  than  chance. 
Lower  values  of  BC  correspond  to  better  classification  accuracy  resulting  in  hypotheses  of  the  form 


Hq  :  BC  >  BCq  vs.  Hi  :  BC  <  BCq 


(5.1) 


Secondly,  it  may  also  be  of  interest  to  compare  the  resulting  BC  values  from  two  competing 
classification  systems  at  their  optimal  point,  in  order  to  determine  if  one  has  superior  classification 
performance.  This  hypothesis  test  may  be  of  greater  interest  to  decision  makers  because  it  provides 
information  useful  for  choosing  a  classification  system  without  having  to  specify  a  BC  threshold 
{BCq).  It  is  assumed  both  classification  systems  are  independent  and  have  the  same  number  of 
classes,  and  the  feature  used  for  each  classification  system  is  independently  and  normally  distributed 
for  each  class.  The  two  classification  systems  being  compared  will  be  denoted  classification  system 
A  and  classification  system  B.  Define  fhe  difference  befween  the  two  BC  values  from  these  systems 
as 

j]  =  BCa-  BCb  (5.2) 
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The  hypothesis  to  compare  their  performance  would  be  of  the  form 


Hq  :  T]  <i]oys-  Hi  :  1]  >  rjo 


(5.3) 


This  hypothesis  may  be  written  for  the  specific  case 


Hq  BCa  <  BCb  vs.  Hi  :  BCa  >  BCg 


(5.4) 


which  is  equivalent  to  testing  at  t/o  =  0  (no  difference  between  performance).  Higher  values  of 
BC  indicate  a  classification  system  with  poor  performance  and  therefore  the  alternate  hypothesis 
reflects  the  case  when  classification  system  B  performs  better  than  classification  system  A. 

In  Section  5.2,  the  delta  method  is  used  for  developing  both  types  of  hypothesis  tests  assuming 
large  sample  sizes.  In  Section  5.3,  both  hypothesis  tests  are  developed  using  a  generalized 
hypothesis  method  for  any  sample  size.  A  simulation  is  also  considered  to  demonstrate  the 
performance  of  the  proposed  hypothesis  test  methods  (with  size  and  power)  and  these  results  are 
presented  in  Section  5.4.  Finally,  a  summary  of  the  findings  is  presented  in  Section  5.5. 

5.2  Delta  Method  Hypothesis  Tests 

5.2.1  One-sided  Hypothesis  Test  on  a  Single  Bayes  Cost  Value. 

Recall  from  Section  3.2,  for  a  normally  distributed  feature,  as  n  ^  oo  ,  BC  ~  N(BC,  Var{BC)) 
and  the  variance  of  BC  is  estimated  via  the  delta  method  with 


(3.10) 


The  partial  derivatives  for  the  three-  and  four-class  BC  are  presented  in  Section  3.2.1  and  Appendix 
A.  3,  respectively.  However,  for  any  number  of  classes,  the  partial  derivatives  are  easily  estimated 
numerically  with  the  two  point  central  difference  method  (Section  3.2.3).  After  estimating  the  partial 
derivatives  and  the  variance  of  BC,  the  one  sided  hypothesis 


Hq  BC  >  BCq  vs.  Hi  BC  <  BCq 


(5.1) 


is  tested  by  calculating  a  p-value  from  the  observed  sample.  The  p-value  is  developed  using  the 


following  theorem. 
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Theorem  7  (Valid  P-value). 


Let  VP(X)  be  a  test  statistic  such  that  large  values  of  W  give  evidence  that  is 
true.  For  each  sample  point  x,  define 

p{x)  -  sup  PeiWiX)  >  W{x)) 

0e&o 

Then,  p(X)  is  a  valid  p-value.  [12,  p.  397] 

Let  1L(X)  =  bc-bCj^  gp,  _  ^  1V(X)  is  distributed  standard  normal  as  n  ^  oo  . 

yVar(BC) 

However,  large  values  of  1V(X)  give  evidence  that  Hi  is  false.  Therefore,  to  test  the  one-sided 
hypothesis  in  Equation  5.1  with  this  test  statistic,  the  p-value  is  determined  as 


p{x)  -  sup  PeiWiX)  <  W{x)) 
ee@o 

For  any  arbitrary  BC'  >  BCq  ,  BC  -  BCq  >  BC  -  BC  and 


/ 

f 

\ 

^  ^  BC-  BCo 

>  P 

Z  < 

BC  -  BC 

^  1  — 

1  — 

yjVariBC)  J 

\ 

yjVariBC)  j 

Therefore,  the  p-value  for  this  hypothesis  test  is  given  by 


(5.5) 


(5.6) 


p{x)  -  sup  PeiWiX)  <  Wix)) 

0e0o 

f  _  \ 

^  ^  ^  BC  -  BCq 
VariBC)^ 

At  the  a  significance  level,  Hq  is  rejected  for  lT(x)  <  Z„  or  p(x)  <  a  . 

5.2.2  One-sided  Hypothesis  Test  on  the  Difference  of  Two  Bayes  Cost  Values. 
For  testing  the  hypothesis 

Hq  :  p  <  Hi  :  p  >  po 


(5.7) 


(5.3) 
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the  parameter  of  interest,  77  ,  is  a  function  of  the  normal  distribution  parameters,  /lyA  ^  ct/a  >  f^j,B  , 
and  (Tj^B  ,  j  =  I,. .  .,k  . 


j]  -BCa  -  BCb 


^  iff  —  II  ■  A  \  k-l  k 

j=2  ^  '  i=2  ,7=1 


^|^m=i,A  BjA^  ^/^m=(-I,A  BjA 


O-  j,A  I  \  O-  j^A 

k 


i*j 

V  r^(  ~^k-lA\  V  rf,  /  ~ 

+  Zj  — I— —  -  z, 


,/=i  ' 

t:-I  k 

7=2  j=\ 

1+] 


o-jA 


j=2 


\  cr  j^B 


^  I  ^m=i,B  Bj,B  \  _  jjj  /  ^m=i-l,B  Bj,B 


O-  j,B 


k-l 


Yj 


./=1 


0-  j,B 

Bj-B  ~  Bf,_i  B 
cr  j,B 


(5.8) 


Therefore,  from  the  multivariate  delta  method  (Theorem  4),  77  =  g(j2,(^  is  Asymptotic- 
Normal[77,  Var(j])]  and  Var(^  is  estimated  by 


Var(r]) 

;=i 


dr] 

d^jA 


VarijUj^A)  + 


drj 

do-jA 
dr] 


d]i 


j,B 


Var{o-j^A) 

2 

VariJIJ^)  + 


dr] 


dcr 


j-B 


VaricTiB) 


(5.9) 


Covariance  terms  are  excluded  due  to  the  assumption  of  independence  between  the  normal 
distributions  for  each  class  and  the  classification  systems  being  compared.  Given  that  77  = 
BCa  -  BCb  and  each  BC  value  only  depends  on  the  parameters  associated  with  the  classification 
system  from  which  it  were  derived. 


dr] 

dyjA 


_(dBCA\" 

\  dyjA  I 


(5.10) 


and 


dr]  ^  /dBCB^ 
djiB/  [driB/ 


(5.11) 
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(where  yj  =  nj  or  crj  and  j  -  .  ,k).  Then  Equation  5.9  may  be  rewritten: 


Var(^  w  ^ 
7=1 


\  I 


Var{o-iB) 


(5.12) 


The  partial  derivatives  required  for  estimating  the  variance  of  77  in  Equation  5.12  are  found  using  the 
partial  derivatives  of  BC.  Recall  these  equations  are  presented  in  Section  3.2.1  for  three  classes,  in 
Appendix  A.3  for  four  classes,  or  generally  for  any  k  classes  with  the  two-point  central  difference 
method  presented  in  Section  3.2.3. 


Similar  to  the  previous  section,  the  test  statistic  is  1T(X)  = 


,  where  large  values  of 


^jVardf) 

VT(X)  give  evidence  that  H\  is  true.  Eor  rj  =  i]o  ,  1T(X)  is  distributed  standard  normal  as  n  ^  00  , 


and  the  p-value  for  this  hypothesis  test  is 

p(x)  =  P 


Z  > 


1-m 


yJVar(rj), 

At  the  a  significance  level,  Hq  is  rejected  for  lT(x)  >  Zi_q,  or  p{x)  <  a  . 


(5.13) 


5.3  Generalized  Hypothesis  Tests 

Eef  ^  =  {9, 6)  where  9  is  the  parameter  of  interest  and  d  is  a  vector  of  nuisance  parameters. 
Definition  4  (Generalized  Test  Variable). 

A  random  variable  of  the  form  T  =  TfX;  x,  ^)  is  said  to  be  a  generalized  test  variable 
if  it  has  the  following  three  properties: 

Property  1:  tots  =  t(x;  x,  ^)  does  not  depend  on  unknown  parameters. 

Property  2:  When  9  is  specified,  T  has  a  probability  distribution  that  is  free  of  nuisance 
parameters. 

Property  3:  For  fixed  x  and  5  ,  Pr{T  <  t;  9)  is  a  monotonic  function  of  9  for  any  given 
t.  [73,  p.  115] 

If  r  is  a  generalized  test  variable  which  is  stochastically  decreasing  in  9  ,  the  generalized 
p-value  for  testing  Hq  :  9  <  9o  vs  Hi  :  9  >  9o  can  be  found  as  [73,  p.  1 19] 


p{x)  =  Pr{T  <  tabs  \  9  =  9q) 


(5.14) 
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5.3.1  One-sided  Hypothesis  Test  on  a  Single  Bayes  Cost  Value. 

For  testing  the  hypotheses  of  the  form 

Ho  :BC>  BCo  vs.  Hi  :  BC  <  BCq  (5.1) 

the  parameter  of  interest  is  the  ^-class  BC  defined  in  Equation  3.9,  which  is  a  function  of  the 
nuisance  parameters,  pj  and  crj  ,  j  =  .  .,k  .  Define  T  =  r(X;  x,  as 

T  =  Rbc-BC  (5.15) 


where  Rbc  was  defined  previously  as 

k 


Rbc  =  Yj^\\ipM  — 
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It  was  shown  in  Section  3.3.3,  that  Rbc  is  free  of  unknown  parameters.  Also  recall  that  the  GPQs 
for  the  optimal  thresholds  {Rei)  are  found  numerically  (when  all  Ci\jPj  are  not  equal,  for  i  t  j).  As 
seen  in  Section  3.3.1,  for  each  class  (indexed  on  j  =  .  .,k) 


R^j  -  Xj  tj 


and 


where 


and 


R.,- 


O'  “ 


l(nj-l)sj 


Xj-Mj 

Sjl^ 


Vj  = 


inj-DS] 


O' : 


(3.16) 


(3.17) 


(3.18) 


(3.19) 


where  tj  ~  t(nj-i)  ,  a  t-distribution  random  variable  with  nj  -  1  degrees  of  freedom,  and  Vj  ~  Xn  -i  ’ 
a  chi-square  random  variable  with  nj  -  1  degrees  of  freedom  [12,  p.  218,  223].  The  observed 
value  of  T,  where  tabs  =  T(x,S)  ,  is  evaluated  by  using  xj  and  S  j  in  Equations  3.18  and  3.19  and 
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then  substituting  Equations  3.18  and  3.19  into  Equations  3.16  and  3.17,  respectively.  This  results 
in  Ri^j{x,S)  =  lij  ,  Ra-jix,S)  =  crj  ,  and  the  numerically  estimated  S)  =  6*^  .  Substituting 

R^j{%  S)  =  Hj  ,  Ro-ji^,  S)  =  CTj  ,  and  Re*,i^,  S)  =  9^  into  Equation  3.25  and  Equation  5.15  results  in 


tabs  -  ^  I 

;=2 
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=BC  -  BC 
-0 


(5.16) 


Therefore,  it  is  clear  that  Property  1  from  Definition  4  is  met  since  tabs  does  not  depend  on  unknown 
parameters.  Property  2  of  Definition  4  is  met,  because  Rbc  is  free  of  unknown  parameters  which 
implies  that  when  BC  is  specified,  T  does  not  depend  on  any  nuisance  parameters.  Einally,  for 
Property  3,  let  the  distribution  of  Rbc  be  denoted  ,  which  is  free  of  unknown  parameters. 

Since  T  =  Rbc  -  FC  ,  the  CDE  of  T  may  be  written  as 


Pr{T  <  t)  =Pr{RBc  <  t  +  BC) 

=FRg,{t  +  BC)  (5.17) 

Therefore,  BC  is  the  location  parameter  for  the  distribution  of  T  implying  the  CDE  of  T  is  a 
monotonic  function  of  BC  [12,  pg.  1 16, 134], [73,  p.  117].  All  three  properties  from  Definition 
4  are  met  for  T  defined  in  Equation  5.15  and  therefore  T  is  a  generalized  test  variable  which  is 
stochastically  decreasing  in  BC.  Erom  Equation  5.14  and  5.15,  the  generalized  p- value  for  this  test 
is  given  by 


p{x)  ^Pr{T  >  tabs  I  BC  ^  BCq) 

=Pr{RBc  -  BC  >  tabs  I  BC  =  BCq) 


^Pr{RBc  -  BCq  >  0) 


-Pr{RBc  >  BCq) 


(5.18) 
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The  probability  in  Equation  5.18  is  evaluated  via  Monte  Carlo  methods  by  generating  a  large 
number  of  values  for  Rbc  (in  the  same  manner  as  was  done  in  Section  3.3.3  for  the  GCI),  and 
then  determining  the  proportion  of  these  values  that  satisfy  the  inequality  in  Equation  5.18  [73,  p. 
119]. 

5.3.2  One-sided  Hypothesis  Test  on  the  Difference  of  Two  Bayes  Cost  Values. 

Eor  testing  the  hypothesis 

Ho  :  Tj  <  Tjovs.  Hi  :  T]  >  ijo  (5.3) 

recall  the  parameter  of  interest,  77  ,  is  a  function  of  the  nuisance  parameters,  ,  ctj^a  ,  Rj,B  ,  and 
o'j.B  ,  j  =  I,.  ■■  ,k  (Equation  5.8).  Now  define  T  =  r(X; x, ^)  as 

T  =  R^-r]  (5.19) 

where  R,^  is  defined  as 

Rr]  -  Rbc  A  ~  RbCb  (5.20) 

and  RbCa  and  RbCb  are  defined  as  in  Equation  3.25,  by  using  Equations  3.16  through  3.19  with  the 
appropriate  sample  mean,  standard  deviation,  and  sample  size  for  each  class  within  each  system.  It 
is  clear  following  the  same  reasoning  as  was  presented  in  Section  5.3.1,  that  tabs  =  0  and  all  three 
properties  from  Definition  4  are  met  for  T  in  Equation  5.19.  Thus,  T  =/?;,-  77  is  a  generalized 
test  variable  which  is  stochastically  decreasing  in  77 .  Erom  Equation  5.14  and  5.19,  the  generalized 
p-value  for  this  test  is 


/7(x)  ^Pr{T  <  tobs  \ri  =  rio) 

^Pr{R^  -  77  <  tabs  \ri  =  m) 

=Pr{Rr,  <  770) 

-Pr{RBCA  ~  RbCb  ^  70)  (5.21) 

The  probability  in  Equation  5.21  is  evaluated  via  Monte  Carlo  methods  by  generating  a  large  number 
of  values  for  RbCa  ~  RbCb  (i^  the  same  manner  as  was  done  in  Section  3.3.3  for  the  GCI,  however, 
now  two  BC  GPQs  are  found,  one  for  each  classification  system,  and  their  difference  stored).  Then 
the  proportion  of  these  values  that  satisfy  the  inequality  in  Equation  5.21  is  determined  [73,  p.  1 19]. 
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5.4  Simulation  Results 


A  simulation  study  was  conducted  to  demonstrate  the  performance  of  the  delta  method 
and  generalized  hypothesis  tests  for  BC  and  77  .  Various  scenarios  are  considered  including 
different  sample  sizes  (nj  =  10,50,100,250),  underlying  distributions  of  the  feature  used  for 
classification  (normal  and  gamma),  differing  costs  associated  with  the  misclassification  outcomes, 
and  classification  accuracy  (measured  by  BC/r]  value).  All  scenarios  assume  a  classifier  with  three 
classes  and  two  thresholds  (6j  <  to  distinguish  between  adjacent  classes. 

All  scenarios  utilize  3000  simulation  runs  in  R  assuming  a  significance  level  of  a  =  0.05. 
When  required,  numerical  minimization  is  performed  using  the  optim  function  in  R  (’’L-BFGS-B” 
method)  [52].  Performance  of  the  hypothesis  tests  is  measured  with  the  simulation  by  estimating 
the  size  and  power  of  each  test. 

Definition  5  (Power  Function). 

The  power  function  of  a  hypothesis  test  with  rejection  region  R  is  the  function  of  6 
defined  by  p{9)  -  Pg  (X  e  R)  [12,  p.  383] 

Definition  6  (Size  a  Test). 

For  0  <  a  <  I  ,  a  test  with  power  function  [3(9)  is  a  size  a  test  if  sup  [3(9)  =  a  fl2, 

p.  385] 

To  evaluate  the  performance  of  the  hypothesis  test,  the  probability  of  rejecting  the  null  hypothesis  is 
determined  for  multiple  BC  (or  77)  values.  The  power  function  for  a  fixed  sample  size  is  monotone 
in  9  (see  for  example.  Figure  6.1).  Therefore,  [3(9)  is  first  determined  at  the  boundary  of  the  null  and 
alternate  parameter  space  (BC  =  BCq  ,  77  =  770)  to  estimate  the  size  of  the  test  (supgg0|^jS(P)).  Then 
values  in  the  alternate  hypothesis  space  (BC  <  BCq  ,  rj  >  tjq)  are  evaluated  to  estimate  the  power 
at  increasing  increments  within  the  alternate  hypothesis.  In  Section  5.4.1,  the  performance  of  the 
one-sided  hypothesis  test  on  a  single  BC  value  is  evaluated  and  in  Section  5.4.2,  the  performance  of 
the  one-sided  hypothesis  test  on  the  difference  of  two  independent  BC  values  is  evaluated. 

5.4.1  One-sided  Hypothesis  Test  on  a  Single  Bayes  Cost  Value. 

Four  BCq  values  are  chosen  to  demonstrate  a  range  of  potential  classification  system 
performance  thresholds.  Under  the  assumption  of  all  Ci\jPj  =  I  ,ioi  ii=  j  ,  BCq  =  0.3, 0.5, 1.0, 1.25. 
For  the  two  additional  cost  structures,  chosen  as  the  cost  structures  used  in  previous  chapters. 


95 


BCq  was  chosen  to  reflect  similar  scenarios  (ie.  similar  normal  curves)  with  the  appropriate  cost 


structure  applied  (recall  Costi 


0  1  2 

0  2  5 

1  0  1 

,  CoSt2  = 

1  0  3 

2  1  0 

1  3  0 

and  all  pj  -  5).  This  results  in 


BCq^CosH  -  0.1,0.2,0.35,0.45  wA  BCq^cosH  -  0.2, 0.4, 0.7, 0.9.  The  normal  distribution  parameters 


to  achieve  these  BCq  values  are  presented  in  Table  5.1.  To  study  the  power  at  differing  BC  values 


in  the  alternate  hypothesis,  the  means  of  the  first  and  third  classes  are  varied  to  achieve  the  required 


BC  value. 


Table  5.1:  Distributional  parameters  for  the  parametric  hypothesis  test  simulation. 


Distribution 

BCq 

Class  1 

Class  2 

Class  3 

Normal  (Equal  Costs) 

B 

cr 

B 

cr 

B 

cr 

0.30 

-2.879 

1 

0 

1 

2.879 

1 

0.50 

-2.301 

1 

0 

1 

2.301 

1 

1.00 

-1.349 

1 

0 

1 

1.349 

1 

1.25 

-0.978 

1 

0 

1 

0.978 

1 

Normal  {Costi) 

B 

cr 

B 

cr 

B 

cr 

0.10 

-2.879 

1 

0 

1 

2.879 

1 

0.20 

-2.077 

1 

0 

1 

2.077 

1 

0.35 

-1.333 

1 

0 

1 

1.333 

1 

0.45 

-0.985 

1 

0 

1 

0.985 

1 

Normal  {Cost2) 

B 

cr 

B 

cr 

B 

cr 

0.20 

-2.976 

1 

0 

1 

2.976 

1 

0.40 

-2.187 

1 

0 

1 

2.187 

1 

0.70 

-1.408 

1 

0 

1 

1.408 

1 

0.90 

-0.989 

1 

0 

1 

0.989 

1 

Gamma  (Equal  Costs) 

a 

yS 

a 

yS 

a 

yS 

0.30 

1.3 

1 

2.3 

3.7 

5 

10.743 

0.50 

1.3 

1 

2.3 

3.7 

5 

5.234 

1.00 

1.3 

1 

2 

1.5 

4 

1.889 

1.25 

1.3 

1 

2 

1.5 

4 

1.162 

Gamma  distributed  features  are  also  considered  for  the  equal  weights  scenario  in  order  to 
evaluate  the  performance  of  the  hypothesis  tests  when  the  assumption  of  normality  is  not  met. 
The  gamma  distributional  parameters  used  are  presented  in  Table  5.1.  To  vary  the  BC  values  for 
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evaluating  the  power  of  the  test,  the  or  and  yS  parameters  from  the  second  and  third  classes  are  varied 
appropriately. 

The  size  and  power  of  the  delta  and  generalized  hypothesis  tests  for  equal  weights  are  presented 
in  Table  5.2  for  a  normally  distributed  feature  and  in  Table  5.5  for  a  gamma  distributed  feature^. 
Simulation  results  for  a  normally  distributed  feature  with  Cost\  are  presented  in  Table  5.3  and  in 

Table  5.4  for  Co 5t2- 

In  general,  the  performance  of  the  delta  and  generalized  hypothesis  tests  are  similar.  Usually, 
the  delta  method  hypothesis  test  is  slightly  more  powerful  than  the  generalized  hypothesis  test, 
however  when  this  occurs  the  delta  method  test  often  has  a  size  >  a  ,  which  is  not  desirable.  Overall, 
the  size  of  the  generalized  hypothesis  tests  is  smaller  than  the  size  of  the  delta  method  hypothesis 
test,  and  is  usually  bounded  <  a  .  For  nj  =  10  and  equal  weights  (Table  5.2),  the  delta  method 
size  far  exceeds  0.05  {a  e  [0.09, 0.1 18]).  Therefore,  the  generalized  hypothesis  test  should  be  used 
over  the  delta  method  tests  for  small  sample  sizes  to  assure  a  is  maintained.  As  BCq  approaches 
the  value  of  chance  classification  {BC=1.5  for  a  three-class  scenario)  the  feature’s  distributions  for 
each  class  become  more  overlapped,  making  determination  of  the  optimal  point  and  correct  class 
ordering  more  difficult.  Therefore,  as  BCq  increases,  the  performance  of  both  tests  is  degraded  with 
respect  to  size  (see  Table  5.2,  BCq  =  1.25).  This  is  more  apparent  when  observing  the  generalized 
hypothesis  test. 

For  the  unequal  cost  scenarios  with  nj  >  50  ,  the  delta  method  performs  better  with  respect 
to  power  if  a  size  of  w  tr  is  acceptable  (Tables  5.3  and  5.4).  However,  the  generalized  hypothesis 
test  has  very  similar  power  to  the  delta  method  test,  and  maintains  size  <  a  (except  for  the  one  case 
where  nj  =  10  and  BCq  =  0.9  for  Cost2)- 

As  expected,  the  performance  of  both  methods  is  degraded  when  the  feature  is  not  normally 
distributed  (see  Table  5.5).  Overall,  the  performance  for  the  gamma  distributed  feature  is  fair  for 
most  scenarios  and  reflects  the  robustness  in  these  methods  for  minor  deviations  from  normality. 

^  A  detectable  difference  equal  to  BCq  would  result  in  testing  at  BC  =  0  which  is  not  possible  with  a  normal  or  gamma 
distributed  feature.  Instead,  the  power  at  this  detectable  difference  is  approximated  by  testing  at  BC  =  0.001. 
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Table  5.2:  Power  for  three  classes  with  a  normally  distributed  feature  with  equal  weights.  Detectable 
difference  indicates  the  difference  of  the  assumed  true  BC  value  and  BCq  (BC  <  BCq).  The  power 
at  a  detectable  difference  of  zero  is  the  estimated  size  of  the  hypothesis  test. 


Detectable  Delta  Hypothesis  Test  Generalized  Hypothesis  Test 


Difference 

nj  =10 

50 

100 

250 

10 

50 

100 

250 

BCo 

=  0.30 

0(a) 

0.118 

0.061 

0.071 

0.063 

0.018 

0.028 

0.043 

0.046 

0.01 

0.131 

0.086 

0.106 

0.122 

0.023 

0.038 

0.069 

0.092 

0.05 

0.193 

0.273 

0.415 

0.678 

0.040 

0.158 

0.312 

0.614 

0.10 

0.332 

0.660 

0.882 

0.999 

0.087 

0.508 

0.823 

0.997 

0.20 

0.758 

0.999 

1.000 

1.000 

0.364 

0.998 

1.000 

1.000 

0.30 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

BCo 

=  0.50 

0(a) 

0.105 

0.053 

0.069 

0.061 

0.025 

0.031 

0.045 

0.047 

0.01 

0.114 

0.073 

0.092 

0.103 

0.030 

0.040 

0.067 

0.082 

0.05 

0.160 

0.207 

0.304 

0.505 

0.043 

0.125 

0.230 

0.455 

0.10 

0.239 

0.465 

0.695 

0.957 

0.075 

0.340 

0.617 

0.944 

0.20 

0.501 

0.945 

0.997 

1.000 

0.210 

0.898 

0.994 

1.000 

0.30 

0.810 

1.000 

1.000 

1.000 

0.503 

1.000 

1.000 

1.000 

0.40 

0.984 

1.000 

1.000 

1.000 

0.890 

1.000 

1.000 

1.000 

0.50 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

BCo 

=  1.00 

0(a) 

0.090 

0.055 

0.064 

0.056 

0.044 

0.043 

0.054 

0.051 

0.01 

0.097 

0.071 

0.085 

0.096 

0.047 

0.054 

0.074 

0.085 

0.05 

0.134 

0.164 

0.246 

0.399 

0.069 

0.128 

0.213 

0.383 

0.10 

0.193 

0.358 

0.558 

0.870 

0.108 

0.308 

0.525 

0.860 

0.20 

0.380 

0.818 

0.975 

1.000 

0.229 

0.788 

0.968 

1.000 

0.30 

0.586 

0.990 

1.000 

1.000 

0.430 

0.985 

1.000 

1.000 

0.40 

0.798 

1.000 

1.000 

1.000 

0.649 

1.000 

1.000 

1.000 

0.50 

0.934 

1.000 

1.000 

1.000 

0.859 

1.000 

1.000 

1.000 

BCo 

=  1.25 

0(a) 

0.095 

0.062 

0.061 

0.057 

0.068 

0.054 

0.055 

0.054 

0.01 

0.103 

0.076 

0.086 

0.095 

0.074 

0.071 

0.081 

0.093 

0.05 

0.144 

0.165 

0.240 

0.401 

0.106 

0.153 

0.231 

0.394 

0.10 

0.202 

0.361 

0.551 

0.871 

0.158 

0.340 

0.537 

0.867 

0.20 

0.384 

0.813 

0.971 

1.000 

0.315 

0.798 

0.970 

1.000 

0.30 

0.584 

0.987 

1.000 

1.000 

0.518 

0.985 

1.000 

1.000 

0.40 

0.789 

1.000 

1.000 

1.000 

0.725 

1.000 

1.000 

1.000 

0.50 

0.920 

1.000 

1.000 

1.000 

0.887 

1.000 

1.000 

1.000 
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Table  5.3:  Power  for  three  elasses  with  a  normally  distributed  feature  with  the  Costi  eost  strueture. 
Deteetable  differenee  indieates  the  differenee  of  the  assumed  true  BC  value  and  BCq  {BC  <  BCo). 
The  power  at  a  deteetable  differenee  of  zero  is  the  estimated  size  of  the  hypothesis  test. 


Deteetable  Delta  Hypothesis  Test  Generalized  Hypothesis  Test 


Differenee 

nj  =10 

50 

100 

250 

10 

50 

100 

250 

BCo 

=  0.10 

0(a) 

0.118 

0.061 

0.071 

0.063 

0.017 

0.027 

0.043 

0.046 

0.01 

0.161 

0.164 

0.228 

0.355 

0.029 

0.080 

0.151 

0.292 

0.05 

0.526 

0.951 

0.997 

1.000 

0.175 

0.891 

0.994 

1.000 

0.10 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

BCo 

=  0.20 

0(a) 

0.097 

0.052 

0.068 

0.061 

0.022 

0.030 

0.045 

0.046 

0.01 

0.126 

0.119 

0.164 

0.247 

0.032 

0.066 

0.118 

0.210 

0.05 

0.318 

0.698 

0.917 

0.999 

0.116 

0.590 

0.880 

0.999 

0.10 

0.715 

0.998 

1.000 

1.000 

0.413 

0.998 

1.000 

1.000 

0.15 

0.980 

1.000 

1.000 

1.000 

0.892 

1.000 

1.000 

1.000 

0.20 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

BCo 

=  0.35 

0(a) 

0.089 

0.062 

0.064 

0.060 

0.032 

0.038 

0.046 

0.049 

0.01 

0.111 

0.098 

0.130 

0.190 

0.041 

0.070 

0.106 

0.167 

0.05 

0.252 

0.519 

0.764 

0.980 

0.109 

0.436 

0.712 

0.974 

0.10 

0.532 

0.972 

0.999 

1.000 

0.317 

0.948 

0.998 

1.000 

0.15 

0.840 

1.000 

1.000 

1.000 

0.636 

1.000 

1.000 

1.000 

0.20 

0.977 

1.000 

1.000 

1.000 

0.911 

1.000 

1.000 

1.000 

0.25 

1.000 

1.000 

1.000 

1.000 

0.996 

1.000 

1.000 

1.000 

0.30 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

BCo 

=  0.45 

0(a) 

0.106 

0.055 

0.071 

0.062 

0.038 

0.044 

0.048 

0.051 

0.01 

0.112 

0.099 

0.117 

0.167 

0.048 

0.073 

0.099 

0.151 

0.05 

0.236 

0.437 

0.660 

0.939 

0.117 

0.375 

0.616 

0.931 

0.10 

0.464 

0.907 

0.992 

1.000 

0.289 

0.878 

0.991 

1.000 

0.15 

0.742 

0.998 

1.000 

1.000 

0.556 

0.997 

1.000 

1.000 

0.20 

0.931 

1.000 

1.000 

1.000 

0.833 

1.000 

1.000 

1.000 

0.25 

0.993 

1.000 

1.000 

1.000 

0.965 

1.000 

1.000 

1.000 

0.30 

1.000 

1.000 

1.000 

1.000 

0.998 

1.000 

1.000 

1.000 
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Table  5.4:  Power  for  three  elasses  with  a  normally  distributed  feature  with  the  Cost2  cost  strueture. 
Deteetable  differenee  indieates  the  differenee  of  the  assumed  true  BC  value  and  BCq  {BC  <  BCo). 
The  power  at  a  deteetable  differenee  of  zero  is  the  estimated  size  of  the  hypothesis  test. 


Deteetable  Delta  Hypothesis  Test  Generalized  Hypothesis  Test 


Differenee 

nj  =10 

50 

100 

250 

10 

50 

100 

250 

BCo 

=  0.20 

0(a) 

0.136 

0.072 

0.077 

0.065 

0.022 

0.034 

0.047 

0.046 

0.01 

0.158 

0.105 

0.134 

0.158 

0.028 

0.049 

0.085 

0.118 

0.05 

0.274 

0.415 

0.604 

0.890 

0.066 

0.269 

0.497 

0.846 

0.10 

0.515 

0.903 

0.992 

1.000 

0.175 

0.811 

0.982 

1.000 

0.15 

0.828 

1.000 

1.000 

1.000 

0.454 

1.000 

1.000 

1.000 

0.20 

1.000 

1.000 

1.000 

1.000 

0.999 

1.000 

1.000 

1.000 

BCo 

=  0.40 

0(a) 

0.119 

0.065 

0.071 

0.060 

0.031 

0.037 

0.050 

0.046 

0.01 

0.132 

0.084 

0.113 

0.119 

0.036 

0.054 

0.077 

0.099 

0.05 

0.193 

0.253 

0.380 

0.634 

0.065 

0.172 

0.306 

0.588 

0.10 

0.310 

0.600 

0.829 

0.993 

0.115 

0.487 

0.773 

0.991 

0.15 

0.465 

0.892 

0.991 

1.000 

0.206 

0.830 

0.984 

1.000 

0.20 

0.656 

0.992 

1.000 

1.000 

0.339 

0.983 

1.000 

1.000 

0.25 

0.828 

1.000 

1.000 

1.000 

0.552 

1.000 

1.000 

1.000 

0.30 

0.954 

1.000 

1.000 

1.000 

0.784 

1.000 

1.000 

1.000 

BCo 

=  0.70 

0(a) 

0.104 

0.062 

0.070 

0.058 

0.046 

0.047 

0.057 

0.051 

0.01 

0.114 

0.080 

0.095 

0.102 

0.049 

0.060 

0.077 

0.091 

0.05 

0.160 

0.198 

0.294 

0.497 

0.082 

0.150 

0.255 

0.470 

0.10 

0.243 

0.460 

0.669 

0.951 

0.128 

0.392 

0.632 

0.946 

0.15 

0.339 

0.725 

0.938 

1.000 

0.190 

0.668 

0.920 

1.000 

0.20 

0.468 

0.921 

0.997 

1.000 

0.285 

0.885 

0.994 

1.000 

0.25 

0.609 

0.988 

1.000 

1.000 

0.403 

0.983 

1.000 

1.000 

0.30 

0.731 

0.999 

1.000 

1.000 

0.549 

0.999 

1.000 

1.000 

BCo 

=  0.90 

0(a) 

0.105 

0.065 

0.063 

0.058 

0.064 

0.056 

0.056 

0.053 

0.01 

0.113 

0.082 

0.094 

0.100 

0.072 

0.073 

0.086 

0.095 

0.05 

0.155 

0.187 

0.267 

0.457 

0.103 

0.161 

0.251 

0.441 

0.10 

0.226 

0.417 

0.622 

0.915 

0.151 

0.386 

0.601 

0.910 

0.15 

0.318 

0.672 

0.903 

0.999 

0.225 

0.636 

0.891 

0.999 

0.20 

0.427 

0.870 

0.990 

1.000 

0.314 

0.845 

0.987 

1.000 

0.25 

0.553 

0.973 

1.000 

1.000 

0.420 

0.965 

1.000 

1.000 

0.30 

0.670 

0.997 

1.000 

1.000 

0.545 

0.996 

1.000 

1.000 

100 


Table  5.5:  Power  for  three  elasses  with  a  gamma  distributed  feature  with  equal  weights.  Deteetable 
differenee  indieates  the  differenee  of  the  assumed  true  BC  value  and  BCq  (BC  <  BCq).  The  power 
at  a  deteetable  differenee  of  zero  is  the  estimated  size  of  the  hypothesis  test. 


Deteetable  Delta  Hypothesis  Test  Generalized  Hypothesis  Test 


Differenee 

nj  =10 

50 

100 

250 

10 

50 

100 

250 

BCo 

=  0.30 

0(a) 

0.085 

0.038 

0.034 

0.014 

0.012 

0.017 

0.018 

0.012 

0.01 

0.092 

0.051 

0.050 

0.031 

0.015 

0.023 

0.032 

0.022 

0.05 

0.143 

0.170 

0.216 

0.353 

0.026 

0.097 

0.160 

0.294 

0.10 

0.273 

0.573 

0.817 

0.990 

0.087 

0.457 

0.756 

0.986 

0.20 

0.603 

0.989 

1.000 

1.000 

0.296 

0.974 

1.000 

1.000 

0.30 

0.832 

0.989 

0.999 

1.000 

0.975 

1.000 

1.000 

1.000 

BCq 

=  0.50 

0(a) 

0.123 

0.089 

0.107 

0.089 

0.031 

0.055 

0.080 

0.073 

0.01 

0.131 

0.111 

0.139 

0.153 

0.036 

0.070 

0.111 

0.130 

0.05 

0.182 

0.264 

0.375 

0.607 

0.054 

0.188 

0.315 

0.571 

0.10 

0.270 

0.547 

0.768 

0.975 

0.093 

0.432 

0.711 

0.968 

0.20 

0.534 

0.965 

0.999 

1.000 

0.242 

0.932 

0.998 

1.000 

0.30 

0.747 

1.000 

1.000 

1.000 

0.599 

1.000 

1.000 

1.000 

0.40 

0.913 

1.000 

1.000 

1.000 

0.883 

1.000 

1.000 

1.000 

0.50 

0.833 

0.989 

0.999 

1.000 

1.000 

1.000 

1.000 

1.000 

BCo 

=  1.00 

0(a) 

0.142 

0.171 

0.221 

0.313 

0.073 

0.146 

0.202 

0.298 

0.01 

0.154 

0.201 

0.281 

0.418 

0.083 

0.174 

0.256 

0.406 

0.05 

0.205 

0.371 

0.546 

0.819 

0.123 

0.329 

0.515 

0.807 

0.10 

0.287 

0.609 

0.825 

0.989 

0.182 

0.574 

0.804 

0.987 

0.20 

0.484 

0.930 

0.995 

1.000 

0.358 

0.913 

0.995 

1.000 

0.30 

0.660 

0.997 

1.000 

1.000 

0.562 

0.996 

1.000 

1.000 

0.40 

0.778 

1.000 

1.000 

1.000 

0.766 

1.000 

1.000 

1.000 

0.50 

0.962 

1.000 

1.000 

1.000 

0.919 

1.000 

1.000 

1.000 

BCo 

=  1.25 

0(a) 

0.129 

0.099 

0.085 

0.061 

0.117 

0.094 

0.083 

0.061 

0.01 

0.143 

0.121 

0.125 

0.112 

0.127 

0.117 

0.117 

0.110 

0.05 

0.198 

0.258 

0.341 

0.493 

0.169 

0.240 

0.332 

0.490 

0.10 

0.286 

0.511 

0.701 

0.931 

0.249 

0.490 

0.692 

0.930 

0.20 

0.490 

0.913 

0.992 

1.000 

0.439 

0.905 

0.991 

1.000 

0.30 

0.690 

0.999 

1.000 

1.000 

0.661 

0.998 

1.000 

1.000 

0.40 

0.839 

1.000 

1.000 

1.000 

0.847 

1.000 

1.000 

1.000 

0.50 

0.921 

1.000 

1.000 

1.000 

0.953 

1.000 

1.000 

1.000 
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5.4.2  One-sided  Hypothesis  Test  on  the  Difference  of  Two  Bayes  Cost  Values. 

To  evaluate  the  performance  of  the  delta  method  and  generalized  hypothesis  tests  used  for 
comparing  the  performance  of  two  independent  classification  systems  with  respect  to  their  BC 
value,  ?7o  is  fixed  at  zero.  All  three  cost  structures  considered  previously  are  also  used  here:  all 
Ci\jPj  -  1  (for  i  t  j),  Cost\  =  10  1  ,  and  Cost2  -  i  0  3  (all  with  pj  =  ^).  The  purpose 

of  this  hypothesis  test  is  to  compare  two  competing  classification  systems,  and  therefore,  the  cost 
structure  placed  on  classification  system  A  and  classification  system  B  are  always  the  same.  When 
the  costs  of  misclassification  are  equal,  normal  and  gamma  distributed  features  are  considered.  For 
the  unequal  cost  scenarios  only  normally  distributed  features  are  used.  In  order  to  evaluate  the  size 
and  power  of  the  test,  the  performance  of  classification  system  A  is  fixed  {BCa  -  0.80  for  equal 
cosfs,  BCa  -  0.50  for  Cost\,  and  BCa  =  0.70  for  Cost2)  and  the  performance  of  classification 
system  B  is  varied  to  achieve  the  desired  p  values. 

The  power  and  size  of  each  hypothesis  test  is  estimated  by  simulation.  The  results  for  equal 
costs  are  presented  in  Table  5.6  for  a  normally  distributed  feature  and  in  Table  5.7  for  a  gamma 
distributed  feature.  The  results  for  Cost\  and  Cost2  are  presented  in  Tables  5.8  and  5.9,  respectively. 
When  all  Ci\jPj  =  I  ,for  i  t  j  ,  the  delta  and  generalized  hypothesis  tests  perform  similarly  well. 
Again,  for  Uj  =  10  the  delta  method  hypothesis  test  has  size  greater  than  a  (a  €.  [0.053,0.061]), 
however  not  by  a  large  margin,  and  maintained  equivalent  or  higher  power  than  the  generalized 
hypothesis  test  (Table  5.6).  The  gamma  distributed  feature  does  not  degrade  the  performance  of 
the  hypothesis  tests  as  much  as  when  testing  a  single  BC  value  (see  Section  5.4.1).  In  fact,  the 
performance  with  the  gamma  distributed  feature  is  good,  with  size  w  a  (Table  5.7).  Since  rj  is  the 
difference  of  the  BC  values  and  therefore  is  a  function  of  the  difference  of  the  distributions,  this  test 
statistic  may  be  more  similar  to  a  normal  distribution  as  compared  to  the  one-sided  test  on  a  single 
BC  value  with  a  gamma  distributed  feature. 

Similar  to  the  one  sided  hypothesis  tests  on  a  single  BC  value,  when  costs  are  unequal,  the  delta 
method  hypothesis  test  has  slightly  higher  power  than  the  generalized  hypothesis  test.  However, 
again  the  delta  method  hypothesis  also  has  slightly  worse  size  than  the  generalized  hypothesis  test 
(see  Tables  5.8  and  5.9). 
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Table  5.6:  Power  for  three-elass  systems  with  normally  distributed  features  with  equal  weights  for 
testing  T]  <0  .  Deteetable  differenee  indieates  the  ditferenee  of  the  assumed  true  value  of  BCa  -  BCb 
{t]  >  0).  The  power  at  a  deteetable  differenee  of  zero  is  the  estimated  size  of  the  hypothesis  test. 


Deteetable  Delta  Hypothesis  Test  Generalized  Hypothesis  Test 


Differenee 

nj  =10 

50 

100 

250 

10 

50 

100 

250 

0(a) 

0.061 

0.052 

0.043 

0.052 

0.047 

0.048 

0.041 

0.051 

0.01 

0.066 

0.060 

0.056 

0.078 

0.049 

0.057 

0.054 

0.076 

0.05 

0.086 

0.112 

0.147 

0.254 

0.066 

0.109 

0.142 

0.253 

0.10 

0.116 

0.222 

0.345 

0.629 

0.095 

0.213 

0.338 

0.628 

0.15 

0.155 

0.374 

0.593 

0.908 

0.125 

0.368 

0.588 

0.906 

0.20 

0.206 

0.558 

0.816 

0.991 

0.169 

0.550 

0.811 

0.991 

0.25 

0.268 

0.733 

0.939 

1.000 

0.222 

0.724 

0.939 

1.000 

0.30 

0.340 

0.877 

0.989 

1.000 

0.293 

0.869 

0.989 

1.000 

0.35 

0.424 

0.951 

0.998 

1.000 

0.364 

0.948 

0.998 

1.000 

0.40 

0.504 

0.985 

1.000 

1.000 

0.450 

0.983 

1.000 

1.000 

0.45 

0.594 

0.995 

1.000 

1.000 

0.541 

0.995 

1.000 

1.000 

0.50 

0.685 

1.000 

1.000 

1.000 

0.638 

1.000 

1.000 

1.000 

Table  5.7:  Power  for  three-elass  systems  with  gamma  distributed  features  with  equal  weights  for 
testing  T]  <0  .  Deteetable  differenee  indieates  the  differenee  of  the  assumed  true  value  of  BCa  -  BCb 
(t]  >  0).  The  power  at  a  deteetable  differenee  of  zero  is  fhe  esfimafed  size  of  fhe  hypofhesis  fesf. 


Defeefable  Delia  Hypofhesis  Tesf  Generalized  Hypofhesis  Tesf 


Differenee 

tij  =10 

50 

100 

250 

10 

50 

100 

250 

0(a) 

0.070 

0.059 

0.066 

0.059 

0.035 

0.055 

0.062 

0.059 

0.01 

0.073 

0.067 

0.082 

0.085 

0.038 

0.064 

0.078 

0.085 

0.05 

0.094 

0.124 

0.166 

0.242 

0.054 

0.117 

0.165 

0.240 

0.10 

0.121 

0.215 

0.322 

0.542 

0.081 

0.209 

0.321 

0.539 

0.15 

0.152 

0.334 

0.502 

0.808 

0.112 

0.328 

0.499 

0.812 

0.20 

0.194 

0.480 

0.690 

0.953 

0.157 

0.478 

0.694 

0.954 

0.25 

0.266 

0.684 

0.901 

0.999 

0.202 

0.670 

0.896 

0.999 

0.30 

0.333 

0.841 

0.974 

1.000 

0.263 

0.834 

0.975 

1.000 

0.35 

0.409 

0.934 

0.996 

1.000 

0.337 

0.929 

0.996 

1.000 

0.40 

0.499 

0.981 

1.000 

1.000 

0.422 

0.979 

1.000 

1.000 

0.45 

0.586 

0.996 

1.000 

1.000 

0.519 

0.996 

1.000 

1.000 

0.50 

0.681 

0.999 

1.000 

1.000 

0.614 

0.999 

1.000 

1.000 
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Table  5.8:  Power  for  three-elass  systems  with  normally  distributed  features  with  the  Costi  structure 
for  testing  rj  <  0  .  Detectable  dilference  indicates  the  difference  of  the  assumed  true  value  of 
BCa  -  BCb  iv  >  0).  The  power  at  a  detectable  difference  of  zero  is  the  estimated  size  of  the 


hypothesis  test. 


Detectable  Delta  Hypothesis  Test  Generalized  Hypothesis  Test 


Difference 

nj  =10 

50 

100 

250 

10 

50 

100 

250 

0(a) 

0.060 

0.050 

0.044 

0.051 

0.037 

0.042 

0.052 

0.053 

0.01 

0.071 

0.078 

0.073 

0.109 

0.043 

0.065 

0.088 

0.109 

0.05 

0.126 

0.241 

0.366 

0.667 

0.081 

0.241 

0.366 

0.675 

0.10 

0.233 

0.605 

0.857 

0.997 

0.168 

0.601 

0.851 

0.995 

0.15 

0.378 

0.909 

0.994 

1.000 

0.313 

0.895 

0.994 

1.000 

0.20 

0.570 

0.992 

1.000 

1.000 

0.512 

0.991 

1.000 

1.000 

0.25 

0.752 

1.000 

1.000 

1.000 

0.704 

1.000 

1.000 

1.000 

0.30 

0.893 

1.000 

1.000 

1.000 

0.869 

1.000 

1.000 

1.000 

0.35 

0.971 

1.000 

1.000 

1.000 

0.962 

1.000 

1.000 

1.000 

0.40 

0.995 

1.000 

1.000 

1.000 

0.995 

1.000 

1.000 

1.000 

0.45 

0.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

0.50 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 
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Table  5.9:  Power  for  three-elass  systems  with  normally  distributed  features  with  the  Cost2  strueture 
for  testing  rj  <  0  .  Deteetable  dilferenee  indieates  the  differenee  of  the  assumed  true  value  of 
BCa  -  BCb  iv  >  0).  The  power  at  a  deteetable  differenee  of  zero  is  the  estimated  size  of  the 


hypothesis  test. 


Deteetable  Delta  Hypothesis  Test  Generalized  Hypothesis  Test 


Differenee 

nj  =10 

50 

100 

250 

10 

50 

100 

250 

0(a) 

0.053 

0.053 

0.048 

0.051 

0.037 

0.045 

0.048 

0.049 

0.01 

0.059 

0.061 

0.064 

0.079 

0.040 

0.055 

0.065 

0.075 

0.05 

0.082 

0.129 

0.168 

0.300 

0.057 

0.120 

0.171 

0.286 

0.10 

0.124 

0.266 

0.412 

0.728 

0.087 

0.263 

0.403 

0.731 

0.15 

0.179 

0.453 

0.706 

0.964 

0.131 

0.462 

0.699 

0.966 

0.20 

0.242 

0.669 

0.903 

0.998 

0.186 

0.664 

0.896 

0.998 

0.25 

0.319 

0.850 

0.983 

1.000 

0.255 

0.837 

0.982 

1.000 

0.30 

0.424 

0.944 

0.998 

1.000 

0.343 

0.937 

0.999 

1.000 

0.35 

0.528 

0.988 

1.000 

1.000 

0.443 

0.984 

1.000 

1.000 

0.40 

0.629 

0.999 

1.000 

1.000 

0.548 

0.997 

1.000 

1.000 

0.45 

0.727 

1.000 

1.000 

1.000 

0.672 

1.000 

1.000 

1.000 

0.50 

0.822 

1.000 

1.000 

1.000 

0.784 

1.000 

1.000 

1.000 

5.5  Summary 

Generalized  and  delta  method  hypothesis  tests  were  developed  for  testing  one  sided  hypotheses 
on  a  single  BC  value  as  well  as  the  differenee  befween  fwo  BC  values  for  eomparing  independenf 
eompefing  elassifiealion  systems.  Bofh  mefhods  are  developed  assuming  elassifiealion  systems  fhaf 
use  a  single  fealure  fhaf  is  independenfly  and  normally  disfribufed  for  eaeh  elass.  The  performanee 
of  fhe  proposed  mefhods  was  demonsfrafed  wifh  simulations  fhaf  evaluafed  fhe  power  and  size  of 
fhe  fesfs.  Varying  seenarios  as  well  as  null  hypofhesis  values  were  eonsidered  wifh  fhe  simulafion. 

In  general,  fhe  generalized  hypofhesis  fesf  performed  better  and  eould  be  reeommended  for 
bofh  forms  of  hypofheses  (fesfs  on  BCo  and  t])  and  fhe  various  eosf  seenarios.  Alfhough,  fhe  delfa 
mefhod  fesfs  performed  similar  fo  fhe  generalized  fesfs  and  offen  had  greater  power,  fheir  size  was 
somefimes  greater  fhan  a  whieh  is  nof  desirable.  However,  fhe  della  mefhod  performanee  was 
improved  for  fesfs  on  rj ,  whieh  mighl  be  due  fo  fhe  inerease  in  lolal  sample  size  (when  eonsidering 
fwo  elassifiealion  syslems  inslead  of  one)  or  fhe  slruelure  of  fhe  lesl  slalislie  ilself.  For  bofh  mefhods. 
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the  performance  with  respect  to  size  was  degraded  for  testing  against  the  BCq  value  which  was 
close  to  chance  classification  (BCq  =  1.5).  When  the  assumption  of  normality  was  not  met,  the 
performance  of  the  hypothesis  tests  on  a  single  BCq  value  was  degraded.  However,  for  testing  the 
difference  of  fwo  BC  values,  fhe  performance  of  fhe  fesfs  remained  fairly  consisfenf.  Therefore, 
if  seems  fhaf  when  fesfing  a  hypofhesis  on  rj  ,  fhe  mefhods  are  more  robusf  fo  deparfures  from 
normalify. 
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VI.  Nonparametric  Hypothesis  Tests 


6.1  Introduction 

In  this  chapter,  hypothesis  tests  for  testing  the  performance  of  a  classification  system  with 
BC  are  developed,  making  no  assumptions  about  the  classification  system’s  underlying  feature 
distributions  or  structure.  Instead,  inference  methods  are  derived  from  the  resulting  classification 
outcomes  from  a  classification  system  at  a  fixed  0  e  0  ,  as  was  done  for  the  nonparametric  CIs 
derived  in  Chapter  4.  Under  this  nonparametric  framework,  it  is  assumed  that  the  classification 
system  outcomes  from  each  class  may  be  modeled  with  independent  multinomial  distributions. 

This  chapter  will  consider  the  same  two  hypotheses  that  were  developed  in  Chapter  5  under  the 
parametric  framework.  The  first  hypothesis  tests  whether  or  not  a  classification  system  performs  at 
least  as  well  as  a  specified  fhreshold  value,  BCq  ,  where 

Ho  :BC>  BCo  vs.  Hi  :  BC  <  BCo  (5.1) 

The  second  hypothesis  considered  compares  two  independent  competing  classification  systems’ 
performance.  This  is  done  by  testing  rj ,  the  difference  in  BC  values  from  the  two  systems  where 

j]  =  BCa-  BCb  (5.2) 

and 

Ho  :  T]  <  Tjovs.  Hi  :  T]  >  rjo  (5.3) 

For  the  specific  case  of  fesfing  if  classification  system  B  is  performing  better  than  classification 
system  A,  this  hypothesis  is  tested  at  770  =  0  . 

In  Section  6.2,  exact  small  sample  methods  are  developed  for  testing  both  hypothesis  tests, 
using  the  fiducial  theory  developed  in  Section  4.2.  In  Section  6.3,  nonparametric  hypothesis  tests 
for  both  hypotheses  are  developed  for  large  sample  sizes  using  likelihood  ratio  tests  (LRTs).  A 
simulation  is  conducted  to  demonstrate  the  performance  of  the  tests  with  respect  to  power  and  size, 
and  the  results  are  presented  in  Section  6.4.  The  overall  findings  and  conclusions  are  presenfed  in 
Secfion  6.5. 
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6.2  Exact  Hypothesis  Tests 

Under  the  nonparametric  framework  and  when  sample  sizes  are  small,  hypothesis  tests  may  be 
conducted  using  similar  exact  methods  as  those  used  for  developing  fiducial  intervals  around  BC  in 
Section  4.2.  In  fact,  the  fiducial  infervals  presenfed  in  Section  4.2  are  simply  fhe  inversion  of  fhe 
accepfance  region  of  a  fwo  sided  hypofhesis  fesf  {BC  =  BCq)  on  BC  [7]. 

6.2.1  One-sided  Hypothesis  Test  on  a  Single  Bayes  Cost  Value. 

The  hypofhesis  of  fhe  form 

Ho  :BC>  BCo  vs.  Hi  :  BC  <  BCo  (5.1) 

may  be  fesfed  by  calculafing  an  associated  p-value  for  fhe  fesf.  Recall  from  Theorem  7  (Section 
5.2.1)  fhaf  a  valid  p-value  is  given  by 

p(x)  -  sup  Pe{W{X)  >  lT(x))  (6.1) 

0e&o 

when  large  values  of  1T(X)  give  evidence  fhaf  Hi  is  frue.  For  fhe  nonparamefric  framework,  1T(X) 
is  BC  defined  empirically  as 

=  (4.27) 

1=1  j=i  j 

i*j 

where  each  X,p-  represenfs  fhe  number  of  observations  classified  as  fhe  class  when  fheir  frue  class 
is  j  ,nj  is  fhe  fofal  number  of  observafions  for  fhe  class,  and  each  X,|/  is  disfribufed  multinomial. 
Once  again  for  fhe  hypofhesis  in  Equation  5.1,  large  values  of  1T(X)  give  evidence  fhaf  Hi  is  false, 
and  Iherefore  fhe  p-value  for  fhis  fesf  is 

p{x)  =  sup  Pe{W{X)  <  W{x))  (5.5) 

0600 

Under  fhe  mulfinomial  framework,  a  resfricfion  on  fhe  BC  paramefer  space  is  also  a  resfricfion 
on  fhe  join!  multinomial  parameter  space,  5  =  {p  =  (pi, . . .  ,Pk)  :  Pj  =  (Fi|;>  •  •  ■  ^Pk\j),Pi\j  ^  0  , 
and  Tj^^i  Pi\j  =  1)  ■  Thus,  fhe  hypofheses  may  be  rewritten  as 

T/q  :  p  €  5o  vs.  T/i  :  p  €  5q  (6.2) 

where  Sq  is  fhe  sef  of  mulfinomial  probabilifies  which  resulf  in  BC  >  BCo  ,  and  is  defined  as 

k  k 

>So  ^  {P  =  (Pi, . . .  ,Pk)  :  Pj  =  {pi\j,  ■  ■  .,Pk\j),Pi\j  >  0,  Zti  Pi\j  =  1’  and  X  ^  Ci\jPjPi\j  >  BCo]  ■ 

i=ij=\ 

1+] 
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From  an  observed  BC  ,  the  exact  p-value  for  testing  the  hypothesis  BC  >  BCq  is  given  by 


p{x)  =  sup  PBc{y  <  y) 

BC>BCo 

=  sup  Pp{Y  <  y) 

peSo 

y 

=  sup  ^  ^  /x(x  I  p)  (6.3) 

peSo  ,=o  xeJi 

y=t 


where  Jl  is  the  joint  multinomial  sample  space  which  is  the  set  oi  \  x  sized  vectors  x  = 
.^^211,  ■  •  • ,  Xk-\\k,  x.k\k)  where  each  Xi\j  is  a  nonnegative  integer  and  Xi\j  =  nj  ,'p  e  S  ,  and 

k 

/x(x  I  p)  =  ]~[/Xj(Xj) 

;=i 


k  k 


=nn 


Xi\j 

Pi\ 

Xi\j ! 


(4.8) 


The  hypothesis  is  tested  by  calculating  the  p-value  in  Equation  6.3  and  comparing  this  value  to  the 
chosen  significance  level,  a  .  For  p{x)  less  than  a  ,  the  null  hypothesis  is  rejected. 

6.2.2  One-sided  Hypothesis  Test  on  the  Difference  of  Two  Bayes  Cost  Values. 

To  test  the  hypothesis  of  the  form 


Hq  ■.  T]  <T]q\S.  Hi  ■.  T]  >  T]o 


(5.3) 


under  the  framework  of  an  exact  hypothesis  test,  modeling  the  outcomes  from  the  two  classification 
systems  with  independent  multinomial  distributions,  the  parameter  of  interest,  77  ,  is  a  function  of 

multinomial  probabilities  such  that 

k  k  k  k 

v=  Yj  Tj 

i=l,ii=j  ,;'=1  i=^Mj  ./'=1 


and 


Y  =  p  = 


k  k 


j=\ 


Ci\j,APj 


^i\iA 

njA 


zz 


^i\jPPj 


nj,B 


(6.5) 


Define  J?l/i  as  the  joint  multinomial  sample  space  for  classification  system  A,  which  is  the  set  of  Ixk^ 
sized  vectors  where  JIa  =  {xa  ^  (xi,a,  •  •  • ,  Xk,A)  :  Xk,A  =  (^ii^,  ■  ■  • ,  ^q/A),  Xiy^A  e  ^ 

Hjia}  ■  Similarly,  define  Jlp  as  fhe  analogous  joinf  multinomial  sample  space  for  classification  sysfem 
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B.  Then  the  sample  space  for  the  entire  experiment  (for  both  classification  systems)  may  be  defined 
as  J{a,b  which  is  the  set  of  1  x  2k^  sized  vectors  where  2Ra,b  -  {(xa,  xb)  :  xa  €  JIa,  xb  €  ^b}  ■ 
Also,  define  the  joint  multinomial  probability  space  for  classification  system  A  where  pa  e 
>S  =  (pa  ^  (Pi,A,---,Pk,A)  :  Pj,A  =  (piij,A,---,PkU,A)’PiUA  ^  0  ,  andZf=iPiU,A  ^  1)  and 


similarly  define  Pb  for  classification  system  B.  The  pmf  for  this  experiment  is  the  joint  multinomial 
distribution  from  both  classification  systems  such  that 

k 

/Xa,Xb(*A,Xb  I  Pa,Pb)  =  ]~[/Xj,A(*j,A)  X  /Xj3(Xj,B) 

;=i 


k  k 


^i\iA 

Pi\jA 


Pub 


A/I/a!  Xilj,B\ 


(6.6) 


Once  again,  the  hypotheses  may  be  rewritten  as  a  restriction  on  the  joint  multinomial  parameter 
space. 

Ho  :  (pa,Pb)  e  Sl  vs.  Hi  :  (pa,Pb)  e  sf  (6.7) 

where  .Sq  =  {(pa>Pb)  :  Pa  £  >S,  Pb  £  >S  and  rj  <  rjo)  .  Then,  for  an  observed  77  from  a  classification 
system,  the  exact  p-value  for  testing  the  hypothesis  in  Equation  5.3  is 


p{x)  -  sup  Pr,{Y  >  y) 

Uno 

=  sup  E(pa,pb)(J'  >  y) 

(Pa,Pb)sSI 

sup{J/| 

=  sup  /xa,Xb(xa,  Xb  I  Pa,  Pb)  (6.8) 

(PA,PB)e.S2  t=y  (xa,xb)6J^(^,@ 

Y=t 

where  }/  =  {y:  y  =  ^  Z  UjaPJ^  -  Z  Z  (xa,xb)  e  5^a,b}  .  For  an  observed 

value  of  T]  ,  Y  ,  and  a  fixed  tjq  ,  the  hypothesis  is  tested  by  calculating  the  p-value  in  Equation  6.8 
and  comparing  this  value  to  the  chosen  significance  level,  a  .  If  p{x)  is  less  than  a  ,  reject  the  null 
hypothesis. 


6.3  Likelihood  Ratio  Tests 

ERTs  are  a  general  and  common  method  that  may  be  applied  for  hypothesis  testing. 
Asymptotic  properties  of  the  likelihood  ratio  also  make  these  tests  easy  to  implement  under  large 


no 


sample  assumptions.  For  the  nonparametric  methods  developed  in  this  section,  it  is  assumed  that 

each  class  has  a  large  sample  size  (nj  ^  50). 

Definition  7  (Likelihood  Ratio  Test  Statistic). 

The  likelihood  ratio  test  statistic  for  testing  Hq  :  6  €  Qq  versus  Hi  :  6  €  0^  is 

supQoL{e  I  x) 

A{x)  =  - 

sup(S)L{6  \  x) 

[12,  p.  375] 

To  conduct  a  hypothesis  test  using  the  likelihood  test  statistic  for  large  samples  sizes,  the 
following  theorem  may  be  used: 

Theorem  8.  Let  X\, . . .  ,Xn  be  a  random  sample  from  a  pdf  or  pmff{x  \  0)  .  Under  the 

regularity  conditions  ...  ,  if  6  €  ,  then  the  distribution  of  the  statistic  —2  log  /l(X) 

converges  to  a  chi  squared  distribution  as  the  sample  size  n  ^  oo  .  The  degrees 
of  freedom  of  the  limiting  distribution  is  the  difference  between  the  number  of  free 
parameters  specified  by  9  e  ©o  and  the  number  of  free  parameters  specified  by 
6»  e  0  [12,  p.  490[. 

Regularity  conditions  are  addressed  in  the  Appendix,  Section  A.5. 

6.3.1  One-sided  Hypothesis  Test  on  a  Single  Bayes  Cost  Value. 

For  this  nonparametric  large  sample  framework,  it  is  again  assumed  that  the  outcomes  from 
the  classification  system  are  distributed  multinomial.  Recall  from  Section  6.2,  that  under  this 
framework,  the  one  sided  hypothesis  on  a  single  BC  value  may  be  written  as  a  restriction  on  the 
joint  multinomial  parameter  space: 


//o  :  P  e  6^0  vs.  //i  :  p  €  6^^  (6.2) 

The  likelihood  function  is  a  function  of  the  parameters,  p  ,  with  the  data  assumed  given.  Thus, 
the  likelihood  is  comprised  of  the  multinomial  pmf,  however  it  may  be  simplified  by  removing  the 
constant  multipliers  which  do  not  depend  on  the  parameters.  Therefore, 

k  k 

L(p  I  x)  oc  ]~[  ]~[  p  J  (6.9) 

,-=I  7=1 

An  unrestricted  maximization  (supg,  L(p  |  x))  of  this  likelihood  results  in  the  multinomial  MLE, 
which  is  given  by  'pi\j  =  ^  .  If  BC  >  BCq  is  observed,  then  "p  €  .So  which  results  in 
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supg,^  L(p  I  x)  =  sup0  L(p  I  x)  .  Therefore, 


A(x)  - 


1  if  BC  >  BCo 

if  SC  <  BCo 


(6.10) 


L(p|x) 

The  degrees  of  freedom  for  the  test  (v)  is  the  difference  of  the  number  of  free  parameters  in  the 
unrestricted  parameter  space  and  the  restricted  parameter  space,  which  is  1.  The  corresponding 
p-value  for  this  large  sample  hypothesis  test  is 


p(x)  - 


1 


if  BC  >  BCo 


(6.11) 


PrOf^  >  -2  log  A(x))  if  BC  <  BCq 


For  an  observed  BC  and  a  fixed  BCo  » the  hypofhesis  is  fesfed  by  calculating  fhe  p-value  in  Equation 
6.11  and  comparing  fhis  value  fo  fhe  chosen  significance  level,  a  .  If  p{x)  is  less  fhan  a  ,  fhe  null 
hypofhesis  is  rejecfed. 

6.3.2  One-sided  Hypothesis  Test  on  the  Difference  of  Two  Bayes  Cost  Values. 

To  fesf  fhe  hypofhesis 

Ho  :  Tj  <  rjo'^^-  Hi  :  T]  >  ijo  (5.3) 


using  a  LRT,  fhe  equations  from  Section  6.2.2  are  used,  where 

k  k  k  k 

^  ^  ^  ^i\jAPjPi\jA  -  ^  ^  Ci\jpPjPi\jp 


;=1 


7=1 


and 


1=1. 'W  7=1 


UB 


^jA  Hb 


(6.4) 


(6.5) 


^A,B  ,  Pa  .  and  pb  are  defined  as  fhey  were  in  Section  6.2.2.  Recall,  the  hypothesis  to  be  tested  may 
be  written  as  a  restriction  on  the  joint  multinomial  parameter  space. 


Ho  :  (Pa,Pb)  e  Sl  vs.  Hi  :  (pa,Pb)  e 


(6.7) 


where  .S?  =  {(Pa>Pb)  :  Pa  £  >S, Pb  £  .Sand?/  <  ?7o)  .  The  likelihood  function  for  the  joint 


multinomial  distribution  of  both  classification  systems  is 

k  k 

UPa,  Pb  I  XA,  xa)  oc  f]  f]  IfljAtfljp 


(6.12) 


1=1  7=1 
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An  unrestricted  maximization  (sup^  L(pa,  Pb  I  xa,  xa))  of  this  likelihood  results  in  the  multinomial 
MLEs,  which  are  given  by  Jjiij  =  ^  .If  Tf  <  rjo  is  observed,  then  (^,  ^)  €  which  results  in 
sup0g  L(pa,Pb  I  xa,xa)  ^  sup0L(pA,PB  I  xa,xa)  .  Therefore, 


T(x)  - 


™p„2L(Pa,Pb|xa,xa) 
•^0 _ _ _ 

L(Pa,Pb|xa,xa) 


if  77  <  ?70 

if?  >  m 


(6.13) 


The  degrees  of  freedom  for  the  test  (v)  is  the  difference  of  the  number  of  free  parameters  in  the 
unrestricted  parameter  space  and  the  restricted  parameter  space,  which  is  1 .  The  p-value  for  this 
hypothesis  test  is 


1 


p{x)  ^  < 


if  77  <  ?7o 


(6.14) 


Prix]  >  -2  log  Aix))  if  77  >  770 

For  an  observed  ?  and  a  fixed  770  ,  the  hypothesis  is  tested  by  calculating  the  p-value  in  Equation 
6. 14  and  comparing  this  value  to  the  chosen  significance  level,  a  .  For  p{x)  less  than  a  ,  the  null 
hypothesis  is  rejected. 


6.4  Simulation  Results 

A  simulation  study  was  conducted  to  demonstrate  the  performance  of  the  exact  and  likelihood 
ratio  hypothesis  tests  for  BC  and  77  .  Various  scenarios  are  considered  including  different  sample 
sizes  {tij  =  5, 10,20,30  for  the  exact  test  and  tij  =  10,50, 100,250  for  the  ERT),  differing  costs 
associated  with  the  misclassifications,  and  classification  accuracy  (measured  by  BCqItjo  value). 
All  scenarios  make  no  assumptions  about  the  structure  of  the  underlying  classification  system  or 
feature  distributions,  and  therefore  the  classification  outcomes  are  simulated  with  random  draws 
from  multinomial  distributions.  The  exact  method  is  appropriate  for  small  sample  sizes  and  the  ERT 
method  is  appropriate  for  larger  sample  sizes  which  is  why  they  are  simulated  with  different  sample 
size  scenarios.  However,  due  to  the  ERT’s  good  performance  at  nj  =  10  ,  further  comparisons 
between  the  ERT  and  exact  method  are  made  with  small  sample  sizes  using  power  curves  (Section 
6.4.1).  The  performance  of  the  tests  is  measured  by  their  power  and  size  (Definitions  5  and  6, 
Section  5.4).  Once  again,  this  is  accomplished  by  determining  the  probability  of  rejecting  the  null 
hypothesis  for  multiple  BC  (or  77)  values. 
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In  Section  6.4.1,  the  performance  of  the  exact  and  likelihood  ratio  one-sided  hypothesis  tests  on 
a  single  BC  value  is  evaluated.  In  Section  6.4.2,  the  performance  of  these  tests  on  the  difference  of 
two  BC  values  is  evaluated.  All  simulations  are  run  in  R  assuming  a  significance  level  of  a  =  0.05 
with  3000  simulation  runs  [52].  The  LRT  requires  the  maximization  of  the  likelihood  given  the 
observed  data  over  the  null  parameter  space.  This  is  accomplished  by  performing  a  constrained 
maximization  of  the  multinomial  log-likelihood  in  R  using  the  function  constrOptim  with  method 
”Nelder-Mead”  [52]. 

6.4.1  One-sided  Hypothesis  Test  on  a  Single  Bayes  Cost  Value. 

For  consistency,  the  same  BCq  and  cost  structures  used  to  demonstrate  the  performance  of 
the  parametric  hypothesis  tests  in  Section  5.4.1  are  also  used  in  this  section.  Recall,  four  BCq 
values  are  used  to  demonstrate  a  range  of  test  performances.  Under  the  assumption  of  equal 
costs  on  all  misclassification  probabilities,  BCq  =  0.3, 0.5, 1.0, 1.25.  For  the  two  additional  cost 


structures  {Costi  = 


0  I  2 
1  0  1 
2  1  0 


and  Cost2  = 


0  2  5 
1  0  3 
1  3  0 


,  Pj  =  i)  BCo^costi  =  0.1,0.2,0.35,0.45  and 


BCo^Costi  -  0.2, 0.4, 0.7, 0.9.  For  all  simulated  BC  values,  it  is  assumed  the  misclassification 
probabilities  are  equally  distributed  among  the  multinomial  misclassification  outcomes.  The  size 
and  power  of  the  exact  and  likelihood  ratio  hypothesis  tests  are  presented  in  Table  6. 1  for  equal 
weights,  and  Tables  6.2  and  6.3  for  Cost\  and  Cost2,  respectively. 

It  is  clear  from  these  results  that  the  exact  hypothesis  test  is  an  a  or  smaller  sized  test  (ie.  a 
level  test).  Also,  as  expected,  the  exact  hypothesis  test  is  conservative  and  the  power  of  the  test 
increases  as  nj  increases  (Table  6.1).  For  BCq  =  0.3  and  0.5  and  nj  =  5  ,  the  test  will  never  reject 
the  null  hypothesis.  For  both  of  these  BCq  scenarios,  the  p- values  for  the  tests  at  BC  =  0  are  0.21 
and  0.06,  respectively.  Therefore,  with  nj  =  5  these  two  tests  never  have  enough  power  to  reject  the 
null  hypothesis  &ia  -  0.05.  For  BCq  -  0.5  ,  the  null  hypothesis  could  be  rejected  for  BC  -  0  with 
a  significance  level  greater  than  0.06.  Similar  scenarios  with  respect  to  p-values  and  power  result 
for  the  exact  test  for  Cost\  and  Cost2  (Tables  6.2  and  6.3).  Notably,  these  cost  structures  result  in 
decreased  power  for  the  exact  test. 

The  LRT  (jij  >  10)  is  also  an  a  level  test  (Tables  6.1  through  6.3).  Like  the  exact  test,  the 
power  increases  for  increasing  nj  .  There  are  also  scenarios  where  this  test  never  rejects  the  null 
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hypothesis,  due  to  comparable  reasons  as  the  exact  test  (Tables  6.2  and  6.3).  Finally,  when  the  costs 
on  the  misclassification  probabilities  are  not  equal,  the  LRT  generally  has  higher  power  than  the 
exact  test  when  considering  the  same  sample  size  scenario  (nj  =  10),  with  some  exceptions  for 
small  BCq  values. 

To  consider  the  comparison  between  the  exact  test  and  LRT  further,  power  curves  were  plotted 
for  differing  BCq  values  assuming  equal  costs  and  small  sample  size  scenarios  (Figure  6.1).  These 
plots  visually  demonstrate  the  similar  performance  between  both  hypothesis  test  methods.  Although 
the  LRT  is  more  powerful  than  the  exact  test  at  nj  =  5  ,  the  LRT  also  has  size  greater  than  a  at  this 
sample  size.  For  larger  sample  sizes  considered  with  the  power  curves  (nj  =  20, 30)  the  exact  test  is 
more  powerful  than  the  LRT  (see  Figure  6.1).  Also,  it  is  clear  from  these  power  curves  that  detecting 
a  more  accurate  classification  system  (smaller  BCq  value),  requires  larger  sample  sizes. 
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Table  6.1:  Power  when  the  miselassilieations  have  equal  weights.  Deteetable  differenee  indieates 
the  differenee  of  the  assumed  true  BC  value  and  BCq  (BC  >  BCq).  The  power  at  a  deteetable 
differenee  of  zero  is  the  estimated  size  of  the  hypothesis  test. 


Deteetable  Exaet  Hypothesis  Test  Likelihood  Ratio  Test 


Differenee 

nj  =5 

10 

20 

30 

10 

50 

100 

250 

BCo 

=  0.30 

0(a) 

0.000 

0.045 

0.017 

0.051 

0.037 

0.031 

0.027 

0.026 

0.01 

0.000 

0.049 

0.017 

0.060 

0.045 

0.040 

0.042 

0.051 

0.05 

0.000 

0.070 

0.033 

0.120 

0.071 

0.110 

0.168 

0.349 

0.10 

0.000 

0.128 

0.088 

0.270 

0.128 

0.317 

0.554 

0.911 

0.20 

0.000 

0.360 

0.402 

0.815 

0.367 

0.929 

0.997 

1.000 

0.30 

0.000 

1.000 

1.000 

1.000 

0.992 

1.000 

1.000 

1.000 

BCo 

=  0.50 

0(a) 

0.000 

0.030 

0.024 

0.023 

0.027 

0.027 

0.025 

0.026 

0.01 

0.000 

0.033 

0.025 

0.028 

0.029 

0.034 

0.034 

0.044 

0.05 

0.000 

0.057 

0.046 

0.063 

0.041 

0.077 

0.110 

0.228 

0.10 

0.000 

0.082 

0.089 

0.143 

0.068 

0.192 

0.339 

0.722 

0.20 

0.000 

0.183 

0.260 

0.448 

0.169 

0.662 

0.920 

1.000 

0.30 

0.000 

0.395 

0.616 

0.855 

0.389 

0.975 

1.000 

1.000 

0.40 

0.000 

0.729 

0.949 

0.997 

0.727 

1.000 

1.000 

1.000 

0.50 

0.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

BCo 

=  1.00 

0(a) 

0.023 

0.038 

0.037 

0.044 

0.036 

0.027 

0.026 

0.023 

0.01 

0.021 

0.038 

0.041 

0.050 

0.039 

0.031 

0.034 

0.039 

0.05 

0.028 

0.051 

0.063 

0.088 

0.054 

0.062 

0.089 

0.155 

0.10 

0.040 

0.080 

0.099 

0.152 

0.080 

0.132 

0.235 

0.500 

0.20 

0.066 

0.156 

0.223 

0.368 

0.153 

0.418 

0.720 

0.980 

0.30 

0.111 

0.260 

0.445 

0.658 

0.263 

0.773 

0.973 

1.000 

0.40 

0.170 

0.412 

0.688 

0.887 

0.429 

0.960 

0.999 

1.000 

0.50 

0.256 

0.610 

0.882 

0.978 

0.617 

0.997 

1.000 

1.000 

BCo 

=  1.25 

0(a) 

0.017 

0.028 

0.046 

0.043 

0.028 

0.022 

0.025 

0.020 

0.01 

0.022 

0.033 

0.051 

0.049 

0.031 

0.026 

0.032 

0.032 

0.05 

0.025 

0.041 

0.076 

0.083 

0.044 

0.056 

0.084 

0.150 

0.10 

0.037 

0.066 

0.118 

0.142 

0.065 

0.119 

0.211 

0.462 

0.20 

0.060 

0.118 

0.246 

0.324 

0.129 

0.375 

0.661 

0.969 

0.30 

0.106 

0.210 

0.444 

0.599 

0.220 

0.705 

0.947 

1.000 

0.40 

0.160 

0.343 

0.672 

0.819 

0.364 

0.923 

0.997 

1.000 

0.50 

0.235 

0.506 

0.842 

0.947 

0.520 

0.992 

1.000 

1.000 
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Table  6.2:  Power  when  the  miselassifieations  have  a  eost  strueture  given  by  Costi.  Deteetable 
differenee  indieates  the  differenee  of  the  assumed  true  BC  value  and  BCq  (BC  <  BCq).  The  power 
at  a  deteetable  differenee  of  zero  is  the  estimated  size  of  the  hypothesis  test. 


Deteetable 

Differenee 

Exaet  Hypothesis  Test 

Likelihood  Ratio  Test 

nj  =5 

10 

20 

10 

50 

100 

250 

BCo 

=  0.10 

0(a) 

0.000 

0.000 

0.039 

0.000 

0.037 

0.028 

0.023 

0.01 

0.000 

0.000 

0.057 

0.000 

0.070 

0.073 

0.118 

0.05 

0.000 

0.000 

0.248 

0.000 

0.451 

0.747 

0.990 

0.10 

0.000 

0.000 

0.984 

0.000 

1.000 

1.000 

1.000 

BCo 

=  0.20 

0(a) 

0.000 

0.043 

0.020 

0.034 

0.027 

0.027 

0.023 

0.01 

0.000 

0.047 

0.023 

0.044 

0.045 

0.055 

0.082 

0.05 

0.000 

0.100 

0.079 

0.094 

0.240 

0.415 

0.802 

0.10 

0.000 

0.251 

0.332 

0.262 

0.772 

0.972 

1.000 

0.15 

0.000 

0.571 

0.784 

0.565 

0.996 

1.000 

1.000 

0.20 

0.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

BCo 

=  0.35 

0(a) 

0.010 

0.013 

0.024 

0.045 

0.025 

0.026 

0.032 

0.01 

0.012 

0.019 

0.024 

0.051 

0.042 

0.045 

0.070 

0.05 

0.022 

0.028 

0.070 

0.095 

0.169 

0.278 

0.595 

0.10 

0.045 

0.083 

0.210 

0.183 

0.518 

0.817 

0.995 

0.15 

0.087 

0.190 

0.466 

0.331 

0.883 

0.995 

1.000 

0.20 

0.169 

0.387 

0.761 

0.539 

0.995 

1.000 

1.000 

0.25 

0.306 

0.641 

0.956 

0.773 

1.000 

1.000 

1.000 

0.30 

0.573 

0.887 

1.000 

0.950 

1.000 

1.000 

1.000 

BCo 

=  0.45 

0(a) 

0.011 

0.023 

0.020 

0.045 

0.026 

0.023 

0.026 

0.01 

0.012 

0.028 

0.025 

0.055 

0.040 

0.043 

0.056 

0.05 

0.029 

0.056 

0.066 

0.096 

0.149 

0.241 

0.529 

0.10 

0.051 

0.122 

0.177 

0.169 

0.446 

0.739 

0.983 

0.15 

0.082 

0.224 

0.381 

0.291 

0.804 

0.979 

1.000 

0.20 

0.145 

0.389 

0.637 

0.445 

0.974 

0.998 

1.000 

0.25 

0.240 

0.604 

0.865 

0.636 

0.998 

1.000 

1.000 

0.30 

0.374 

0.800 

0.978 

0.813 

1.000 

1.000 

1.000 
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Table  6.3:  Power  when  the  miselassifieations  have  a  eost  strueture  given  by  Cost2-  Deteetable 
differenee  indieates  the  differenee  of  the  assumed  true  BC  value  and  BCq  (BC  <  BCq).  The  power 
at  a  deteetable  differenee  of  zero  is  the  estimated  size  of  the  hypothesis  test. 


Deteetable 

Differenee 

Exaet  Hypothesis  Test 

Likelihood  Ratio  Test 

nj  =5 

10 

20 

10 

50 

100 

250 

BCo 

=  0.20 

0(a) 

0.000 

0.000 

0.038 

0.000 

0.040 

0.031 

0.030 

0.01 

0.000 

0.000 

0.043 

0.000 

0.051 

0.045 

0.057 

0.05 

0.000 

0.000 

0.094 

0.000 

0.156 

0.215 

0.451 

0.10 

0.000 

0.000 

0.218 

0.000 

0.427 

0.697 

0.980 

0.15 

0.000 

0.000 

0.500 

0.000 

0.829 

0.987 

1.000 

0.20 

0.000 

0.000 

1.000 

0.000 

1.000 

1.000 

1.000 

BCo 

=  0.40 

0(a) 

0.000 

0.000 

0.024 

0.026 

0.030 

0.029 

0.027 

0.01 

0.000 

0.000 

0.030 

0.029 

0.041 

0.039 

0.048 

0.05 

0.000 

0.000 

0.058 

0.046 

0.101 

0.146 

0.285 

0.10 

0.000 

0.000 

0.113 

0.086 

0.252 

0.436 

0.815 

0.15 

0.000 

0.000 

0.211 

0.157 

0.512 

0.804 

0.994 

0.20 

0.000 

0.000 

0.364 

0.268 

0.799 

0.979 

1.000 

0.25 

0.000 

0.000 

0.576 

0.419 

0.963 

1.000 

1.000 

0.30 

0.000 

0.000 

0.805 

0.646 

1.000 

1.000 

1.000 

BCo 

=  0.70 

0(a) 

0.000 

0.019 

0.024 

0.046 

0.029 

0.030 

0.024 

0.01 

0.000 

0.023 

0.030 

0.048 

0.035 

0.037 

0.036 

0.05 

0.000 

0.032 

0.052 

0.065 

0.077 

0.102 

0.183 

0.10 

0.000 

0.050 

0.087 

0.088 

0.165 

0.272 

0.601 

0.15 

0.000 

0.075 

0.158 

0.131 

0.308 

0.534 

0.898 

0.20 

0.000 

0.110 

0.226 

0.188 

0.497 

0.789 

0.991 

0.25 

0.000 

0.154 

0.354 

0.241 

0.695 

0.944 

1.000 

0.30 

0.000 

0.219 

0.490 

0.319 

0.855 

0.989 

1.000 

BCo 

=  0.90 

0(a) 

0.011 

0.018 

0.020 

0.043 

0.028 

0.026 

0.023 

0.01 

0.010 

0.022 

0.023 

0.046 

0.033 

0.035 

0.037 

0.05 

0.018 

0.032 

0.049 

0.060 

0.072 

0.093 

0.162 

0.10 

0.024 

0.048 

0.081 

0.087 

0.147 

0.242 

0.525 

0.15 

0.032 

0.074 

0.125 

0.129 

0.270 

0.467 

0.851 

0.20 

0.046 

0.117 

0.190 

0.168 

0.435 

0.711 

0.980 

0.25 

0.056 

0.156 

0.288 

0.213 

0.616 

0.890 

1.000 

0.30 

0.070 

0.213 

0.417 

0.265 

0.771 

0.974 

1.000 
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POWER 


for  tij  -  5  (red),  tij  -  10  (blue),  nj  =  20  (green),  and  tij  =  30  (purple)  at  different  BCq  values. 
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6.4.2  One-sided  Hypothesis  Test  on  the  Difference  of  Two  Bayes  Cost  Values. 

For  testing  the  difference  of  two  independent  classification  systems,  770  =  0  is  used.  To 
consider  different  detectable  differences  for  fhe  fesf,  BCa  is  fixed  af  0.8  and  BCb  is  varied 
{BCb  -  (0.3, . . . ,  0.8))  fo  simulafe  fhe  desired  77  values.  Mulfinomial  random  variables  are  generafed 
assuming  fhe  misclassificafion  probabilities  are  evenly  disttibufed  among  fhe  classes  for  all  BC 
values. 

For  fhe  exacf  hypofhesis  fesf,  fhe  sample  space  for  fwo  independenf,  fhree-class  classification 
sysfems  {J{a,b  ,  to  consider  BCa  and  BCb  simulfaneously)  becomes  very  large.  Due  fo  fhis  large 
sample  space,  fhe  compufafional  time  is  also  large.  Therefore,  fhe  fesf  is  run  for  small  sample  sizes 
only  and  assuming  all  Ciypj  =  \  ,ioxi  j  (allowing  for  binomial  disfribufions  fo  be  used  insfead  of 
mulfinomial  disfribufions,  in  order  fo  reduce  fhe  sample  space).  The  resulfs  are  presenfed  in  Table 
6.4.  Bofh  fhe  exacf  and  LRT  hypofhesis  fesfs  perform  similarly  wifh  respecf  fo  power  and  sample 
size,  alfhough  for  nj  -  10  fhe  exacf  fesf  is  more  powerful  fhan  fhe  LRT.  Also,  bofh  fesfs  have  size 
<  a. 


Table  6.4:  Power  for  mulfinomial  disfribufed  classes  wifh  equal  weighfs  for  fesfing  77  <  0  . 
Defecfable  difference  indicafes  fhe  difference  of  fhe  assumed  frue  value  of  BCa  -  BCb  ^  0). 
The  power  af  a  defecfable  difference  of  zero  is  fhe  esfimafed  size  of  fhe  hypofhesis  fesf. 


Defecfable  Exacf  Tesf  Likelihood  Ratio  Tesf 


Difference 

Hj  -5 

10 

10 

50 

100 

250 

0(a) 

0.045 

0.038 

0.042 

0.031 

0.030 

0.029 

0.01 

0.052 

0.042 

0.044 

0.039 

0.039 

0.033 

0.05 

0.072 

0.078 

0.051 

0.063 

0.078 

0.102 

0.10 

0.112 

0.141 

0.070 

0.114 

0.169 

0.315 

0.20 

0.197 

0.298 

0.125 

0.301 

0.496 

0.854 

0.30 

0.326 

0.540 

0.208 

0.585 

0.866 

0.997 

0.40 

0.507 

0.752 

0.332 

0.854 

0.989 

1.000 

0.50 

0.687 

0.910 

0.493 

0.977 

1.000 

1.000 
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6.5  Summary 

Two  nonparametric  methods  for  testing  hypotheses  on  BC  were  derived,  an  exact  test  for  small 
sample  sizes  and  a  LRT  based  on  large  sample  theory.  An  interesting  result  from  the  simulation  is 
the  similar  performance  of  the  exact  and  LRT  hypothesis  tests,  especially  in  the  hypothesis  test  on 
a  single  BC  value.  Although  the  LRT  is  an  approximate  method,  it  performs  similar  to  the  exact 
test  with  respect  to  power,  even  for  the  rij  -  10  small  sample  size.  Due  to  the  discrete  sample  space 
of  BC,  although  the  p-values  found  with  the  LRT  test  are  approximate,  they  are  accurate  enough 
to  make  the  same  decision  as  the  exact  test  for  some  observed  values  of  BC.  This  is  demonstrated 
for  an  example  in  Table  6.5,  for  testing  different  BCq  values  for  a  three-class  classification  system 
with  uy  =  10  and  BC  -  0.1.  In  this  example,  although  the  LRT  p-values  are  not  the  same  as  the 
exact  p-values,  they  result  in  the  same  decision  (with  respect  to  rejecting  or  failing  to  reject  the  null 
hypothesis)  for  a  =  0.05.  Consequently,  the  two  methods  at  times  have  similar  performance  with 
respect  to  size  and  power. 


Table  6.5:  P-values  for  exact  and  likelihood  ratio  tests  for  a  three-class  scenario  for  testing  a  single 
BCq  value  with  nj  =  10  and  BC  =  0.1 


BCq 

Exact  p-value 

ERT  p-value 

0.3 

0.184 

0.127 

0.5 

0.029 

0.011 

1 

8.34E-05 

L52E-05 

1.25 

2.89E-08 

3.14E-07 

Another  result  of  interest  is  that  when  the  misclassification  weights  are  unequal,  the  likelihood 
ratio  test  generally  has  slightly  higher  power  than  the  exact  test  (although  notably  this  comparison 
is  only  made  for  nj  =  10).  The  exact  hypothesis  test  was  implemented  to  calculate  a  p-value  by 
searching  the  null  probability  space,  incremented  by  probabilities  of  0.05.  Therefore,  a  better  search 
method  for  finding  these  exact  p-values  may  result  in  more  precise  (less  conservative)  values  which 
could  increase  the  power  of  this  test. 


121 


The  methods  developed  in  this  seetion  provide  flexible  hypothesis  tests  whieh  may  be 
used  for  testing  the  performanee  of  a  single  elassifieation  system  or  for  eomparing  performanee 
between  elassifieation  systems.  These  hypothesis  tests  may  be  implemented  despite  differing 
elassifieation  struetures  or  nonparametrie  seenarios.  The  exaet  hypothesis  tests  perform  well,  but  are 
eomputationally  diffieult  for  inereasing  sample  size  (espeeially  for  tests  on  rf).  The  LRTs  therefore 
provide  an  approximate  alternative  to  the  exaet  test  that  is  easier  to  implement  eomputationally, 
espeeially  for  larger  nj  . 
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VII.  Applications 


7.1  Classifying  Breast  Cancer 

The  methods  proposed  in  Chapter  3  are  used  to  distinguish  classes  of  the  Breast  Tissue  data 
set  from  the  UCI  Machine  Learning  Repository  [4].  This  data  set  consists  of  106  observations 
of  nine  continuous  features  derived  from  electrical  impedance  spectroscopy  truncated  spectrum  of 
breast  tissue,  which  have  been  shown  to  discriminate  breast  tissue  into  six  categories:  Carcinoma 
(CAR,  n=21),  Fibro-adenoma  (FAD,  n=15).  Mastopathy  (MAS,  n=18).  Glandular  (GLA,  n=16). 
Connective  (CON,  n=14),  and  Adipose  (ADI,  n=22)  [61].  By  grouping  the  classes  GLA,  FAD,  and 
MAS  together  (denoted  FAD+MAS+GLA)  this  becomes  a  four-class  classification  problem.  These 
three  classes  are  grouped  together  because  their  discrimination  is  not  considered  important  and  they 
cannot  be  discriminated  using  the  available  features  [4,  61].  In  [61],  linear  discriminant  analysis 
was  used  to  distinguish  between  various  subgroups  of  classes  and  it  was  determined  that  the  low 
frequency  limit  (/q),  area  under  the  spectrum  normalized  by  impedance  distance  between  spectral 
ends  (AREAd^),  and  the  maximum  of  the  spectrum  {IP max)  were  the  best  features  for  discriminating 
between  freshly  excised  breast  tissue.  However,  it  was  also  suggested  that  the  length  of  the  spectral 
curve  feature  (P),  may  be  able  to  simultaneously  discriminate  between  the  four  derived  classes  of 
interest  [61].  This  four-class  diagnostic  scenario  is  addressed  using  the  derived  parametric  methods, 
considering  these  four  features  as  potential  class  discriminators  (/q,  AREAd^,  IPmax,  and  P). 

Mean,  standard  deviation,  median,  and  range  of  the  four  features  for  each  class  are  presented  in 
Table  7.1.  P  appears  to  have  small  overlap  between  all  groups  when  compared  to  the  other  features, 
indicating  it  may  perform  well  as  a  classifier.  IP  Max  and  /q  have  significanf  overlap  befween  fhe 
CAR  group  and  af  leasf  one  ofher  fealure  (CON  for  IP  Max  and  FAD-i-MAS-i-GLA  for  Iq).  AREAq^ 
has  subsfanfial  overlap  befween  all  classes.  The  mefhods  developed  in  Chapfer  3  require  normalify 
of  fhe  feafure  fo  be  used  for  classificafion,  however  fhe  mean  and  median  dafa  indicafes  fhaf  some 
of  fhe  feafures  may  be  skewed.  The  Shapiro-Wilk  fesf  is  used  fo  fesf  fhis  assumpfion  and  performs 
well  compared  fo  ofher  goodness  of  fif  fesfs  [32].  The  assumpfion  of  normalify  is  mef  for  IPmax 
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only,  so  a  Box-Cox  transformation  is  used  to  transform  the  other  three  features  to  normality  where 


Feature^  -  1 

Featuretransformed  —  ^ 


(7.1) 


This  results  in  T  =  0.09  for  AREAq^  and  A  -  -0.31  for  both  /q  and  P,  found  using  the 
powerTransform  function  in  the  car  package  in  R  [24,  52,  60].  After  the  transformation,  all  classes 
pass  the  test  for  normality  except  for  connective  tissue  with  a  p-value  of  .014  and  .047  in  /q  and 
P,  respectively.  As  was  demonstrated  in  Chapter  3,  these  slight  deviations  from  normality  are  not 
expected  to  have  a  large  negative  impact  on  the  Cl  around  BC,  however  the  CIs  around  the  optimal 
thresholds  may  not  perform  well. 

Prevalences  are  adjusted  to  account  for  the  FAD-tMAS-i-GLA  class  being  the  combination  of 
three  classes,  resulting  in  prevalences  of:  Pfad+mas+gla  =  ^  and  pcAR  =  Pcon  =  Padi  =  \  ■ 
All  four  features  (/q,  AREAd^,  IPmax,  and  P)  are  considered  separately  as  potential  features  to 
discriminate  between  the  four  classes  (with  equal  cost  given  to  all  misclassification  rates).  For  each 
feature,  BC4  and  its  95%  Cl  is  determined  using  Equation  2.16  and  the  GCI  presented  in  Section 
3.3.3  where 


Pll;  =  O 


01  -Pj 


P2\j  =  $ 


02  -  Pj 


o-j 


P3\j  = 


O3  -  Pj 


o-j 


P4\j  =  O 


-d) 

-d) 

Pj  -  ^3 

CTi 


01  -  Pj 


o-j 


02  -  Pj 


o-j 


(7.2) 

(7.3) 

(7.4) 

(7.5) 


and  d)  is  the  standard  normal  CDF  [52].  The  GCIs  are  chosen  for  this  application  over  the  delta 
method  CIs  due  to  the  sample  sizes  in  each  class. 

Because  the  CAR  class  may  be  considered  the  most  important  to  detect,  a  second  cost  structure 
is  assumed  which  gives  greater  cost  for  misclassifying  a  CAR  subject  as  any  of  the  other  classes  and 
also  a  higher  cost  on  the  class  specific  misclassification  of  any  subjects  from  the  other  three  classes 


0  9  4  4 
6  0  6  6 
4  9  0  4 
14  9  4  Of 


,  assuming  an  ordering  of  the  class 


as  CAR.  This  results  in  a  cost  structure  where  Cost  = 
means  where  Pfad+mas+gla  <  Pcar  <  Pcon  <  Padi  (the  cost  structure  is  adjusted  appropriately 
for  features  with  a  different  ordering).  Once  again  the  BC4  value  and  associated  95%  GCI  for  all 
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four  features  are  determined.  The  BC4  values  and  95%  Cl  for  each  feature  and  cost  structure  are 
given  in  Table  7.1. 


Table  7.1:  Descriptive  statistics  for  features  (broken  into  four  classes:  FAD+MAS+GLA,  CAR, 
CON,  ADI)  to  classify  breast  tissue  and  each  features’  BCa  values  with  95%  generalized  confidence 
intervals. 


Feature 

Mean 

Standard  Deviation 

Median 

Range 

P 

FAD+MAS+GLA 

283.38 

106.30 

252.48 

[124.98,  553.38] 

CAR 

479.97 

93.19 

477.55 

[329.09,  656.77] 

CON 

1065.00 

356.07 

1121.19 

[528.70,  1524.61] 

ADI 

2138.75 

386.51 

2068.05 

[1475.37,  2896.52] 

BCa  equal  costs 

0.65  (0.49,  0.91) 

BCa  unequal  costs 

1.02  (0.75,  1.46) 

IP  Max 

FAD+MAS+GLA 

27.20 

10.22 

26.86 

[7.97,  49.33] 

CAR 

64.53 

18.85 

69.39 

[35.60,  96.56] 

CON 

72.96 

34.45 

70.10 

[23.98,  143.09] 

ADI 

194.60 

106.56 

164.63 

[51.85,  436.10] 

BCa  equal  costs 

0.89  (0.73,  1.16) 

BCa  unequal  costs 

1.32(1.08,  1.74) 

lo 

FAD+MAS+GLA 

259.73 

104.22 

245 

[103.00,  544.65] 

CAR 

394.23 

87.04 

389.87 

[269.50,  551.88] 

CON 

1212.86 

386.47 

1328.17 

[649.37,  1724.09] 

ADI 

2052.05 

342.49 

1974.56 

[1600.00,  2800.00] 

BCa  equal  costs 

0.77  (0.58,  1.04) 

BCa  unequal  costs 

1.21  (0.90,  1.63) 

AREAd^ 

FAD+MAS+GLA 

10.25 

6.60 

9.19 

[2.76,  33.60] 

CAR 

32.05 

9.28 

31.30 

[15.94,  44.90] 

CON 

14.00 

10.77 

14.77 

[1.60,  43.39] 

ADI 

50.78 

33.93 

44.59 

[14.64,  164.07] 

BCa  equal  costs 

1.31  (1.16,  1.52) 

BCa  unequal  costs 

2.14(1.99,  2.62) 

Using  the  BCa  values  and  their  95%  CIs,  discriminatory  ability  of  each  feature  is  determined 
(equal  or  unequal  costs).  All  features  perform  better  than  chance.  It  is  clear  that  P  and  IPmax  are 
performing  better  than  ARE  Ad ^  for  equal  and  unequal  costs  since  the  CIs  around  BCa  for  AREAd^ 
are  higher  than  the  other  two.  Under  the  carcinoma  weighted  cost  structure  the  CIs  for  P,  IPmox, 
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and  /o  overlap  and  therefore  these  features  may  be  considered  equally  good.  However,  for  both  cost 
structures  considered,  P  has  the  lowest  estimate  for  BCa  and  it  also  has  the  lowest  upper  bound  on 
the  95%  Cl,  indicating  the  lowest  maximum  potential  BCa  value. 

Choosing  P  to  discriminate  between  all  four  classes  with  equal  costs  (with  i^fad+mas+gla  < 
PCAR  <  PcoN  <  Padi)  >  the  optimal  thresholds  (0*  <  9^  <  9^)  and  their  95%  GCIs  are  9*^- 
402.21  (375.13,  441.14),  0*-  643.20  (587.21,  717.27),  and  6';-1540.50  (1387.497,  1665.80).  The 
contingency  table  resulting  from  applying  this  classifier  at  its  optimal  point  to  the  data  is  presented 
in  Table  7.2.  Choosing  P  to  discriminate  between  all  four  classes  with  a  higher  cost  on  the 
misclassification  of  carcinoma,  the  optimal  thresholds  and  their  95%  GCIs  are  0*  =  380.53  (353.07, 
409.69),  91  =  662.83  (596.61,  740.75),  and  0;  -  1540.50  (1397.89,  1675.78).  The  contingency 
table  resulting  from  applying  this  classifier  fo  fhe  dafa  af  ifs  optimal  poinf  is  also  presenfed  in  Table 
7.2.  The  fwo  differenl  cosf  sfrucfures  resulf  in  differenl  estimates  for  0*  and  0*  ,  buf  nol  for  0*  , 
demonsfrafing  fhe  impacf  differing  cosf  sfrucfures  may  have  on  defermining  fhe  optimal  fhresholds. 


Table  7.2:  Contingency  fables  for  classifying  breasf  tissue  using  lengfh  of  specfral  curve  (P). 


Equal  Cosfs 


Unequal  Cosfs 


Predicted  Class 

True  Class 

FAD+MAS+GLA 

CAR 

CON 

ADI 

FAD+MAS+GLA 

0.90 

0.24 

0.00 

0.00 

CAR 

0.10 

0.71 

0.21 

0.00 

CON 

0.00 

0.05 

0.79 

0.05 

ADI 

0.00 

0.00 

0.00 

0.95 

FAD+MAS+GLA 

CAR 

CON 

ADI 

FAD+MAS+GLA 

0.80 

0.14 

0.00 

0.00 

CAR 

0.20 

0.86 

0.29 

0.00 

CON 

0.00 

0.00 

0.71 

0.05 

ADI 

0.00 

0.00 

0.00 

0.95 

Using  fhe  fhresholds  which  resulf  from  fhe  cosf  sfrucfure  which  weighfs  fhe  misclassification 
of  carcinoma  higher,  the  correct  classification  rate  for  carcinoma  increases  from  71%  to  86%.  This 
results  in  14%  of  CAR  subjects  being  misclassified  as  FAD+MAS+GLA  (also  an  abnormal  state). 
None  of  the  carcinoma  cases  are  being  classified  as  eifher  of  fhe  fwo  normal  classes  (CON  and 
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ADI)  when  the  weighted  cost  structure  is  used.  In  [61],  linear  discriminant  analysis  was  used 
for  the  classification  of  subgroups  of  the  six  classes.  Using  this  method,  more  than  one  feature 
may  be  considered  at  a  time  for  discrimination.  When  discriminating  only  between  two  classes, 
CAR  and  FAD+MAS+GLA,  they  found  two  features  {AREAq^  and  IPmax)  resulted  in  the  best 
classifier.  Using  this  linear  discrimination  they  had  approximately  the  same  correct  classification 
rate  for  CAR  (86.36%)  as  we  observed.  However,  our  diagnostic  tests  are  simpler  (depend  on 
one  feature  using  simple  cut-offs  between  classes)  and  simultaneously  classifies  between  all  four 
classes.  If  distinctions  between  only  CAR  and  FAD-i-MAS-i-GLA  were  of  interest,  higher  correct 
classification  rates  may  potentially  be  achieved  using  other  features.  Using  linear  discriminant 
analysis,  the  false  negative  rate  may  be  altered  by  adjusting  boundaries  for  a  single  class  of  interest, 
however  costs  for  all  decisions  can  not  be  accounted  for  a  priori.  Finally,  the  resulting  classification 
rates  for  the  connective  tissue  group  are  the  worst,  which  may  be  a  result  of  this  group’s  departure 
from  normality. 

The  CIs  around  BC  reflect  the  uncertainty  in  each  feature’s  ability  to  classify  due  to  the 
variation  of  the  data.  Notably,  as  observed  from  the  simulation  results,  the  Cl  on  BC  is  more 
robust  than  the  CIs  on  the  optimal  thresholds  for  transformed  data  in  the  Box-Cox  family  (as  in 
this  application).  Here,  constructing  a  Cl  on  BC  allows  the  researcher  to  decide  on  the  best  feature 
(or  test).  In  this  study,  P  was  found  to  be  the  best  single  feature  for  classifying  breast  tissue. 
Further  study  may  be  conducted  in  order  to  verify  the  optimal  thresholds  to  implement  this  feature 
in  practice  for  diagnosis. 
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7.2  Classifying  Chronic  Allograft  Nephropathy 

After  kidney  transplant  (KT),  chronic  allograft  nephropathy  (CAN)  is  one  of  the  prevalent 
factors  leading  to  renal  transplant  failure,  yet  its  progression  is  still  not  well  understood. 
Biopsy  is  a  means  of  determining  if  a  patient  has  CAN,  however  it  is  of  interest  to  determine 
methods  for  detecting  progression  towards  CAN  after  KT  which  are  less  invasive.  Due  to  the 
inflammatory  response  generated  by  tissue  damage  associated  with  CAN,  it  has  been  suggested 
that  proinflammatory  cytokine  markers,  such  as  the  transforming  growth  factor-/?  1  may  provide 
an  early  indication  of  potential  allograft  loss  [48].  Mas  et.  al.  conducted  a  study  to  evaluate 
gene  panel  mRNAs  in  urine  samples  for  their  usefulness  as  a  non-invasive  tool  for  evaluating 
graft  function  [37].  This  study  suggested  that  the  biomarkers  transforming  growth  factor-ySl  (TGF- 
/31),  angiotensinogen  (AGT),  and  epidermal  growth  factor  receptor  (EGFR)  (all  measurable  mRNA 
levels  in  urine)  could  be  useful  as  early  predictors  of  allograft  function  [37].  There  were  32  normal 
kidney  function  patients  (NKF)  ,  18  normal  kidney  function  with  proturina  patients  (NKF-i-,  a 
progression  towards  CAN),  and  14  CAN  patients  six  months  post  transplant  examined  in  their  study. 
Descriptive  statistics  of  the  three  biomarkers  within  each  diagnostic  state  are  presented  in  Table  7.3 
with  a  more  detailed  description  of  all  the  markers  originally  considered  found  in  [37]. 


Table  7.3:  Descriptive  statistics  of  three  features  (broken  into  three  classes:  NKF,  NKF-i-,  CAN)  to 
classify  kidney  function. 


Feature 

Class 

Mean 

Standard  Deviation 

Median 

Range 

AGT 

NKF 

15.47 

16.02 

8.02 

[1,64] 

NKF-t 

4.76 

6.30 

2.90 

[0.11,24.25] 

CAN 

4.63 

3.44 

4.15 

[0.05,9.85] 

TGF-pi 

NKF 

1.56 

1.22 

1.37 

[0.13,6.06] 

NKF-t 

32.75 

128.85 

1.04 

[0.33,548.75] 

NKF-tf 

2.39 

4.58 

0.93 

[0.33,19.70] 

CAN 

5.31 

5.06 

3.26 

[1.23,19.70] 

EGFR 

NKF 

15.41 

15.34 

9.71 

[1,64] 

NKF-t 

7.12 

12.51 

4.01 

[0.11,51.98] 

CAN 

4.23 

3.27 

3.65 

[0.05,9.85] 

(These  values  exclude  the  extreme  observation  where  TGF-/31  =  548.74. 
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Potential  multi-class  classifiers  were  evaluated  in  [57]  using  volume  under  the  surface  (VUS) 
of  the  ROC  manifold.  The  highest  VUS  (best  classification  performance)  resulted  from  a  classifier 
which  simultaneously  utilized  both  the  AGT  and  TGF-y61  biomarkers,  splitting  the  two  dimensional 
parameter  space  into  regions  for  classification  using  arrays.  However,  the  mathematical  complexity 
of  this  classifier  makes  it  hard  to  implement. 

Instead,  a  simplified  version  of  the  classifier  in  [57]  with  practical  rules  using  thresholds  for 
the  observed  values  of  AGT  and  TGF-/31  may  be  used.  Further,  comparisons  between  different 
classifiers  utilizing  such  rules,  with  varying  levels  of  complexity  are  made.  First,  Classifier  1  is 
a  simpler  classifier,  utilizing  single  threshold  values  on  the  two  biomarkers  for  TGF-ySl  and  AGT, 
respectively  {0  -  (0i,  ^2)): 

Classifier  1: 

Assign  patient  i  to 

class  3  (CAN)  if  xtgf-^j  >  0i 

class  2  (NKF-I-)  if  xtcf-pj  ^  0i  and  XAGT,i  <  02 

or  class  1  (NKF)  otherwise. 

This  classifier  is  plotted  in  Figure  7.1  (top)  using  the  optimal  threshold  values  which  were  found 
to  minimize  the  empirically  estimated  BC  using  a  simple  grid  search.  These  threshold  values 
associated  with  the  minimum  BC  (equal  costs  and  prevalences  are  assumed  for  all  misclassification 
outcomes)  stre  0  -  (2.55, 3.65)  .  This  classifier  is  represented  with  vertical  and  horizontal  lines  and 
has  the  advantage  of  only  requiring  two  threshold  values.  For  example,  a  subject  whose  TGF-ySl  is 
2.4  and  an  has  AGT  of  3.1  would  be  classified  with  NKF-i-  and  a  subject  whose  TGF-ySl  is  greater 
than  2.55,  regardless  of  their  AGT  value,  would  be  classified  with  CAN.  Classifier  1  correctly 
classified  26  of  32  patients  as  NKF,  9  of  18  patients  as  NKF-t,  and  1 1  of  14  patients  as  CAN  and  has 
a  corresponding  BC  =  0.90  (see  Table  7.4  for  the  full  contingency  table  of  classification  outcomes). 
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Figure  7.1:  Plot  of  AGT  vs.  TGF-/31  with  three-class  classification  systems  (Top:  Classifier  1, 
Bottom:  Classifier  2)  for  classifying  patients  as  NKF  (A),  NKF-i-  (■),  or  CAN  (*).  These  plots 
exclude  the  extreme  observation  in  TGF-y61,  where  (TGF-ySl,  AGT)=(548.74,  4.59),  however  this 
point  is  included  in  the  classification. 
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A  more  complex  variant  of  Classifier  1  is  also  proposed  that  allows  for  the  horizontal  and 
vertical  lines  to  have  slope.  This  classifier,  Classifier  2,  considers  non-rectangular  regions  in  the 
ACT  and  TGF-ySl  plane  and  requires  four  thresholds  {6  -  {61,62, 63, 64)): 

Classifier  2: 

Assign  patient  i  to 

class  1  (NKF)  if  XagTJ  >  [^4  X  -  ^4^3]  and  XAGT,i  >  [^2  X  XTGF-pi,i  +  ^i] 

class  3  (CAN)  if  xagt,!  ^  [^2  x  xtgf-isi,!  +  ^1] 
or  class  2  (NKF+)  otherwise. 

This  classifier  is  plotted  in  Figure  7.1  (bottom)  using  the  four  optimal  threshold  values  associated 
with  the  minimum  BC,  6  =  (2.925, 1.45, 1.0, 5.0)  .  Classifier  2  correctly  classified  28  of  32  patients 
as  NKF,  8  of  18  patients  of  NKF+,  and  13  of  14  patients  as  CAN  with  a  corresponding  BC  -  0.75 
(see  Table  7.4  for  the  full  contingency  table  of  classification  outcomes).  Based  on  these  point 
estimates  of  BC,  Classifier  2  is  performing  better  than  Classifier  1,  demonstrating  the  potential 
utility  of  non-rectangular  regions  in  this  instance. 


Table  7.4:  Contingency  tables  for  classifying  subjects  into  three  groups  with  respect  to  chronic 
allograft  nephropathy. 


Classifier  1 

BC  -  0.90 

Predicted  Class 

True  Class 

NKF 

NKF-r 

CAN 

NKF 

0.81 

0.03 

0.16 

NKF-r 

0.28 

0.50 

0.22 

CAN 

0.07 

0.14 

0.79 

NKF 

NKF-r 

CAN 

Classifier  2 

NKF 

0.88 

0.22 

0.00 

BC  -  0.76 

NKF-r 

0.06 

0.44 

0.07 

CAN 

0.06 

0.33 

0.93 

This  data  consists  of  small  sample  sizes  of  the  classes,  non-normality  of  the  biomarkers  in 
each  class  (which  do  not  transform  to  normality),  and  the  requirement  to  use  two  biomarkers 
simultaneously  in  order  to  make  the  desired  classifications.  Therefore,  the  proposed  fiducial 
interval  from  Chapter  4  can  be  used  to  construct  a  Cl  around  the  optimal  BC  for  both  classifiers. 
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Using  the  fiducial  interval,  a  95%  Cl  for  Classifier  1  is  BC  e  [0.56, 1.29]  and  for  Classifier  2  is 
BC  €  [0.44, 1.13].  Both  CIs  demonstrate  that  these  classifiers  are  performing  better  than  chance 
because  they  do  not  span  BC  =  1.5.  Although  Classifier  2  reflects  better  classification  for  the  CAN 
diagnostic  state  (13  instead  of  1 1  patients  correctly  classified),  the  overlap  of  these  two  CIs  indicates 
that  Classifier  2  may  not  perform  better  than  Classifier  1  across  all  diagnostic  states. 

A  nonparametric  hypothesis  test  may  be  conducted  to  formally  test  whether  the  more  complex 
classifier  (Classifier  2)  is  performing  better  than  the  simpler  classifier  (Classifier  1).  This  was 
accomplished  with  the  LRT  developed  in  Section  6.3.2  for  testing  hypotheses  on  rj  .  Based  on  the 
simulation  results  in  Section  6.4.2,  the  LRT  is  appropriate  for  this  application  because  for  sample 
sizes  of  rij  =  10  or  more  the  LRT  maintained  a  size  less  than  a.  For  this  application, 

^  “  J^Cciassifierl  ~  classifier!  (2.6) 

and  the  hypothesis  being  tested  is 

Ho  :  T]  <  Tjovs.  Hi  :  T]  >  rjo  (5.3) 

Using  the  LRT,  the  p-value  for  this  test  is  0.5 1  (77  =  0. 15).  The  exact  hypothesis  test  was  shown  with 
simulations  in  Section  6.4.2  to  have  higher  power  than  the  LRT  for  tests  on  rj  .  However,  although 
applying  the  exact  test  here  might  result  in  a  slightly  smaller  p-value,  the  difference  in  p-values 
would  not  be  enough  to  change  the  decision  of  the  test  at  a  significance  level  of  0.05.  Therefore, 
the  null  hypothesis  is  not  rejected  and  there  is  not  enough  evidence  to  conclude  the  more  complex 
classifier  is  performing  better  than  the  simpler  classifier. 

This  application  demonstrates  the  use  of  nonparametric  inference  methods  on  BC  for  a 
classifier  using  thresholds  for  a  pair  of  biomarkers.  Future  work  on  associating  the  inflammatory 
response  with  diagnostic  states  leading  to  CAN,  may  utilize  these  methods  to  make  comparisons 
between  combinations  of  alternate  classifiers  (e.g.  random  forests)  and  biomarkers  to  determine 
that  which  best  aids  diagnosis  of  allograft  function  post  transplant.  This  demonstrates  an  important 
use  of  flexible  inference  methods  for  BC. 
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VIII.  Conclusions 


Performance  of  classification  systems  at  their  optimal  point  is  of  great  importance  for 
classification  methods.  The  commonly  employed  Youden  index  allows  for  summarizing  a 
classification  system’s  performance  at  its  optimal  thresholds,  as  the  sum  of  correct  classification 
rates.  Bayes  Cost,  which  minimizes  misclassification  rates  instead,  has  been  shown  to  be  a  more 
flexible  metric  for  characterizing  performance  of  a  classification  system  due  to  its  ability  to  allow 
for  any  costs  and  prevalence  to  be  placed  on  all  class  specific  misclassificafions.  In  facl,  due  fo  fhe 
flexibilify  of  BC,  fhe  mefhods  developed  in  fhis  disserfafion  may  also  be  used  for  inference  on  J. 

Allhough  esfimafing  BC  and  fhe  oplimal  fhresholds  is  of  inferesl,  quantifying  fhe  uncerfainfy 
in  a  classification  sysfem’s  performance  is  also  of  greal  practical  use,  especially  if  fhe  classification 
system  is  nol  already  delermined,  or  if  new  or  varying  lesls  require  comparison.  Therefore,  fhis  work 
has  developed  new  Cl  and  hypofhesis  fesl  mefhods  for  BC  under  paramelric  and  nonparamelric 
frameworks.  CIs  for  k  >  3  classes  were  limited  in  fhe  liferafure,  and  previous  lo  fhis  work, 
hypofhesis  lesls  had  nol  been  developed.  Under  paramelric  scenarios,  fhe  generalized  inference 
mefhods  were  shown  wilh  simulafion  lo  outperform  the  inference  methods  which  utilized  the  delta 
method.  For  nonparametric  settings,  exact  inference  methods  were  derived  which  were  developed 
with  the  fiducial  argumenl.  These  mefhods  may  require  large  compulalional  lime,  and  fherefore  a 
likelihood  ratio  fesl  was  also  developed  which  may  be  used  as  an  approximafe  alternative  lo  fhe 
exacf  hypofhesis  fesl  when  sample  sizes  are  large  enough.  The  mefhods  which  have  been  proposed 
are  possible  for  any  finife  number  of  oulcome  classes. 

BC  can  incorporate  any  cosl  sfruclure  on  fhe  correcl  and  incorrecl  classificalion  rales.  However, 
if  is  possible  lo  pick  cosl  slruclures  lhal  would  resull  in  no  oplimal  solulion  for  fhe  classification 
system  [65].  Therefore,  cosls  should  be  chosen  wilh  realislic  concerns  in  mind.  If  cosls  reflecf  frufh 
and  no  solulion  exisls  for  fhe  classificalion  system,  Ihen  fhe  cosls  musl  be  adjusfed  if  possible,  or 
more  ideally,  a  heller  system  found  which  can  allow  for  Ihe  necessary  cosl  slruclure. 

Fulure  work  may  consider  more  efficienl  mefhods  for  calculating  fhe  exacf  fiducial  interval 
bounds  as  well  as  computing  exacf  p-values,  fherefore  conserving  compulalional  lime  and  making 


133 


the  implementation  of  the  exaet  methods  easier.  Also,  the  GCI  performed  well  for  a  elassifieation 
system  with  a  single  feature  that  is  independently  and  normally  distributed  for  eaeh  elass.  Therefore, 
it  may  be  of  interest  to  eonsider  a  generalized  approaeh  for  inferenee  on  BC  when  the  feature  used 
for  elassifieation  is  not  normal  (ex.  gamma,  ehi  square,  mixtures,  ete.).  Finally,  this  work  has 
assumed  fixed  prevalenees  on  eaeh  elass.  However,  it  is  possible  that  the  prevalenee  of  a  elass 
is  not  known  explieitly.  Future  work  may  eonsider  inferenee  on  BC  when  the  prevalenee  of  eaeh 
elass  follows  a  known  distribution  to  eonsider  a  possible  range  of  prevalenee  values.  Under  this 
framework,  Bayesian  methods  may  be  employed  to  determine  properties  of  Bayes  Cost  as  well  as 
eorresponding  eredible  sets  for  Bayes  Cost  and  the  optimal  thresholds. 
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Appendix  A:  Mathematical  Derivations  and  Support 
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A.l  Asymptotic  Distribution  of  Sample  Mean  and  Variance 

In  order  to  show  {X„,  5'„)  ^  mvn  ,  some  necessary  theorems  and  definition  are  presented  first. 
Definition  8  (Converges  in  Probability). 

A  sequence  of  random  variables,  X\,X2,...,  converges  in  probability  to  a  random 
variable  X  if  for  every  e  >  0  ,  lim„^oo  P(\Xn  —  X\  >  e)  =  Q  or,  equivalently, 
lim„^«,  P(|X„  -  X|  <  e)  -  1  [12,  p.  232] 

Theorem  9  (Central  Limit  Theorem  (CLT)). 

Let  Xi,X2, ...  be  a  sequence  ofiid  random  variables  with  EXi  =  p  and  0  <  VarXi  - 
cr^  <  oo  .  Define  X„  =  (1  jn)  Gnix)  denote  the  cdfof  v/n{Xn  -  p)lcr  .  Then, 

for  any  x,  -oo  <  x  <  oo  , 

XJC  1  ^ 

_ e~^^^dy 

oo  xfh: 

that  is,  xfn{X„  -  p)/o'  has  a  limiting  standard  normal  distribution.  [  12,  p.  238] 

Theorem  10  (Slutsky’s  Theorem). 

IfX„  — >  X  in  distribution  and  ^  a  ,  a  constant,  in  probability,  then 
a.  YnX„  — >  aX  in  distribution 

b.  Xn  +  Yn  ^  X  +  a  in  distribution.  ]12,  pg.  239-240] 

Theorem  11. 

Let  Xi,X2, . . . ,  be  iid  f{x  \  6)  ,  let  9  denote  the  MLE  of  9  ,  and  let  t{9)  be  a  continuous 
function  of  9  .  Under  the  regularity  conditions  [...]  on  f{x  \  9)  and ,  hence,  L{9  \  x) , 

V^[t(?)  -  t(6I)]  -^n[0,v{9)] 

where  v(9)  is  the  Cramer-Rao  Lower  Bound.  That  is,  t{9)  is  a  consisten  and 
asymptotically  efficient  estimator  ofT(9)  ]12,  pg.  472] 


Regularity  conditions  are  presented  in  Section  A.5,  and  are  assumed  for  the  normal  distribution. 
From  the  CLT  it  is  clear  that  X„  — >  n[p,o-  Jn]  .  To  see  that  the  sample  variance  (Sf)  also  has  a 
limiting  normal  distribution  first  note  from  Theorem  1 1  that  V^[cr^  -  cr^]  n[0,  v(cr^)]  ,  where 
is  the  Maximum  Likelihood  Estimator  (MLE)  of  cr^  and 


Also,  it  is  clear  that 


al  =  -y{Xi-Xr,f  =’^Sl 

n  n 


i=l 


lim  P 

n—*co 


sfn 


n  -  \ 


o-„  -  0 


<e  =  1 


(A.l) 


(A.2) 
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which  implies  from  Definition  8  that  0  .  Now  consider, 


V^(^cr2-cT2) 
=  yfn 


n  ^2  1-2  2\  ^  V«  -2 

n  -  1  n  -  1  )  n  -  1 


r  — 2 

Vn(cr„  -  o-  j  + - rC7-„ 


n  -  \ 


(A.3) 

d 


Let  Xfi  -  -  cr^)  and  Yn  =  >  then  from  Slutsky’s  Thm,  sfn^Sl  -  cr^j  =  X„  +  Y, 

n[0,  v(cr^)]  +  0  =  n[0,  v((r^)]  . 

Finally,  since  X„  and  are  indpendent  ([12,  p.  218]),  their  asymptotic  joint  distribution  is 
simply  the  product  of  their  asymptotic  normal  marginals,  which  is  the  bivariate  normal  pdf  with 
correlation,  p  ,  of  zero.  Therefore,  (X„  ,  S^)  mvn[(ju,  cr^),  (cr^ In,  v(cr^))]  . 

A.2  Derivation  of  partial  derivatives  of  three-class  Bayes  Cost  with  respect  to  all  distribu¬ 
tional  parameters. 


SBC 

d/ui 


C2I1PI  x(a>(2ai)-<t(to))+C3|,pi  x(a>(l!^))4 
C1I2P2  X  (<1>  +  <!3|2P2  X  (t  (l^))  + 

P.I3P2  X  (4.  (fcp))  +  <;2|3P3  X  (<1.  (te)  _  4,  (te)) 

P2I1P.  X  ^  [<t  (^)]  -  Ql.P,  X  ^  [<I>(te)]  +  ,31, p,  X  ^  [4>(=^)] 


dfii 


+ 


<^.|2P2  X  4;  [<t  (to)]  +  ,3,3p3  x  ^  (S^)] 


-I- 


C1I3F3  X  [3>  (^)]  +  C2I3F3  X  ^7  [d)  (te)]  _  C2I3P3  X  ^7  [O  [^)] 


C2I1P1  X0( 

47] 

(^T')1 

C2I1P1  X0(  ) 

47] 

- 

C3I1F1 

¥7  [ 

(T?)l 

+  Cl|2P2X0(  ) 

Wi  [ 

("r)] 

- 

C3I2P2 

¥7  [ 

+  Cl|3P3X</.(  ) 

Wi  [ 

("';D] 

- 

C2I3F3  X  (p  ( 

)47 

)]  C2|3P3X0(‘^f 

)47 

I]  1 

\  J  /  'x'F*  1L\‘^J/J  \  i  1  L  \  J  /  J  -• 

C2I1P1  X0(ta)[||(^-1  _^-l]_C2|ipi 

C3I1F1  X  K'  -  St^r']  +  C1I2F2  X  0(^)  [||(^2^]  + 

C3I2F2  X  0  {^)  [-£(^2^]  +  C1I3P3  X  <^(^)  [|7(^;^]  + 
C2I3P3  X  [H^T- 1]  -  C2I3P3  X  0  (te) 


-I- 
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CT2'  [cmPlcP  (^)  [H]  +  C3l2P2Cf>(f^)  [-g]]  + 

(T-l  [ci|3;730  (to )  [g]  +  C2|3;^3<^  (gf )  [g]  -  ^213/^30  (gf )  [g]]  . 

[[ft  “  1] gii/^i<^(gf)  -  ‘^3|i;^i<^(gr))  -  C2|i;?i0(gf^)  [g  -  1]]  + 
[ci|2;^20(gg)  [g]  +  C212P2cP(<^)  [-g]]  + 

^3  '  [[g]  (^113/^30  (gf )  -  C2|3;^30  (gf ))  +  C2|3;^3<^  (gf )  [g]] 

[[g  -  1]  ^(Vr)  (^211/^1  -  C311P0  -  C2\lPl(t>  (gf )  [g  -  1]]  + 
erf  [ci|2;^20(gf )  [g]  +  C3|2;^20(gg)  [-g]]  + 

<^3  ^  [[g]  (gf)  (‘^113^3  -  C2|3;^3)  +  C2\3P3‘P  (gf )  [g]] 


^211/^1  x(0(gf)- CD (gf))  +  c3|i;.ix(0(fg))+  ■ 

^  c:i|2;^2  X  (cD  (gf ))  +  C3\2P2  X  (cD  (fg))  + 

.  Wx(cD(gf))  +  C2|3;.3x(cD(te)-cD(gf)) 

C2\^PI  X  g  [cD  (gf )]  -  C2\XPl  X  g  [o  (gf )]  +  C3\lPl  X  g  [cD  (gf )]  + 
^112/^2  X  g  [cD  (fg)]  +  C3|2;.2  X  g  [cD  (fg)]  + 

.  C1I3P3  X  g  [cD  (gf )]  +  C2\3P3  X  g  [cD  (te)]  _  C2\3P3  X  g  [cD  (gf )] 

C2\ipi  X  0(gf )  g  [(gf )]  -  C2\lPl  X  0(gf )  g  [(gf )]  + 

^311/^1  X  gfg)  g  [(fg)]  ^  c,2;.2  X  ggf )  g  [(gf )]  ^ 

C3\2P2  X  0(fg)  g  [(fg)]  +  C1|3;^3  X  0(gf )  g  [(gf )]  + 

^  OTX0(gf)g[(gf)]-C2|3;^3X0(gf)g[(gf)] 

C2\IP\  X  0(gf )  [gerf  ]  -  C2\iPi  X  <^(gf )  [gerf  ]  + 
c^3|i;^i  X  0(fg)  [-gerf  ]  +  Ci|2;?2  X  0(gf )  [gerf  -  erf  ]  + 
c^3|2;?2  X  <^(gf )  gf  -  gerf  ]  +  ci\3P3  X  <^(gf )  [gerf  ]  + 
c:2|3;^3  X  <^(gf )  [gerf  ]  -  C2|3;?3  X  0(gf )  [gerf  ] 
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Pulling  out  the  standard  deviations  and  using  -  <P  results  in 


ldBC\ 

[dm] 

(T^^[c2\lPl<p(  C2|1P10(  +  C2\lPi<p(^ 

S?)  [-£]]+ 

ell- 

1  £^3'  [ci\2P2(l>(  )  [q^[]  +  C2|3P30(  [54]  C2|3P30( 

'  61  -fii  '1  [  ] 

^  0-3  )Ydii2\ 

11  . 

C3|1P10(^'^,"))  C2|i;?i0( 

)[fe]l  + 

= 

£^2^  h|2P20(  1]  +  C3|2P20(''^,3  [1 

11- 

i  "^3  ^  ]  (ci|3P30  (  ‘^3^'  )  C2|3P3<^  (  'a-T  ))  +  ""213^30  ( 

9  [ill 

[[awl'll '^r)(‘'2|ipi  C3|ipi)  C2|i;?i0(  ‘^f‘)[5^;]]- 

1- 

— 

(T2'h\2P2(|>('aT)V^^^2  1]+C3|2P2<^(''^^3^)[1  5/3]]  + 

i  C2|3P3)+C2|3P30( 

139 


0.1 


C2\m  X  (o  (ta)  _  (D  (ta))  +  X  (o  (^^))  + 

^  Cl\2P2  X  (cD  (to))  +  c^\2P2  X  (o  (to))  + 

.  wx(o(to))  +  ,,,3^3x(cD(te)_o(te)) 

^211/^1  X  4^  [$  (to)]  -  C2\lPl  X  [o  (to)]  +  C2\IPI  X  [o  (to)]  + 
Cl|2;^2  X  [(&  (to)]  +  ^312/^2  X  [(D  (to)]  + 


Cl|3/^3X  g[(D( 

01  -/i3  ' 

0-3  , 

)]  +  ^213/^3  X  g 

-  C2|3;?3  X 

d 

dfi3 

C2|i;?i  x0(  2g') 

re2-/ii\l 

^  o-i  )\ 

-  C2\lPl 

s|i[' 

1  O'!  > 

I]- 

C3ii;?i  x0(''‘^^") 

4i[i 

+  Ci\2P2 

( 

1  0-2  y 

1]- 

C3|2;?2X(^(^^^^") 

rIi 

f  IJ2-S2  \j 
^  0-2  )\ 

+  Ci\2P2 

0/i3  L' 

^01-lU3' 
1  O's  > 

1]- 

C2I3P3  X  0  ( 

)47 

r;r; 

)]  C2|3;?3X0(g^ 

/  5^(3 

C2|i;?i  x<^(to)[||c^-i]_ 
C2|i;?i  X0(to)[||(^-1]  + 
C3ii;^ix<^(to)[_||cr-i]  + 
ci|2;?2X0(to)[|ic^-i]  + 

C3|2;?2X<^(to)[_||^-l]  + 


Cil3P3X0[  g)| 

g(^-l  _  ^-1]  + 

C2|3;^3X0(  gg 

- 

C2I3P3  X0(  g') 

[£<^3'  -^3“']  . 

[c2nPl(p{ 

[a^3]  +  ^3|i;^i‘^(''U') 

[-£11- 

[ci|2;^2<^(to)  [|l]  +  c2\2P2Cp(^)  [-H]]  + 

^3“^  hl3/^30  ( to)  [H  -  1]  +  C2|3;?30  (to)  [g  -  1]  -  C2|3;?3<^  ( to)  [g  -  1]] 


[[£]  (c2|i;?i<^(to)  _  C2\ipicp(f^))  -  C2|i;?i0(to)  [g]]  + 

[ci|2;^20(to)  [g]  +  c2\2P2CI>(^)  [-g]]  + 

-  "^3“'  [[£  -  1]  (^113/^30  (to)  -  c2|3;?30  (to))  +  c2|3;730  ( to)  [g  -  l]] 

[[£]  '^(^)(''2|i;^l  -  C2\xpi)  -  C2|i;?l<^(to)  [g]]  + 

[ci|2;^20(to)  [g]  +  c2\2P2<P(^)  [-g]]  + 

-  "^3“^  [[H  -  1]  (^113^3  -  C2|3;?3)  +  C2|3;?30  ( to)  [g  -  l]] 
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dBC\ 
do-i  ) 


d 

dcTi 


C2\IPI  X  (CD  (ta)  -  O  (ta))  +  X  (CD  (rt ))  + 

Cl\2P2  X  (^>  (^))  +  C3\2P2  X  (cD  (^))  + 

W  X  (^(^))  +  C2\^P2  X  (cD(te)  -  cD(te))  _ 


C2\iPi  X  ^  [cD  (ta)]  _  C2\ip,  X  [3>  (^)]  +  X  ^  [cD  (^)]  + 


CX\2P2  X  [cD  (ta)]  +  C3I2P2  X  [cD  (^)]  + 


<^113/^3  Xgf^[cD( 

0-3  . 

)l+«-|3P3X^_[o(^-f)] 

-  C2\3P3  X 

d 

dcri 

C2\lPl^(l)(  Vf*) 

[At)] 

C2\lPl><(p{'^^') 

dcri  [ 

\  0-1  > 

I]- 

c:3ii;?i 

ilrA 

\+Cll2P2X(p( 

dcri  [ 

/  6i  -fl2 '' 
\  0-2  > 

1]- 

(^3|2;?2X0(^"^^^) 

itA 

[AT)] 

\+Cil3P3X<p(  ^3^') 

dcri  [ 

/01-^3> 
\  0-3  ) 

1]- 

C^2|3;?3  X^( 

)ik 

[(At: 

)]  c:2|3;?3  X0( 

)  dcTi 

[("';D1 
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C2\lPl  X  [^^(02  -  Pl)  +  (02  -  - 

C2|i;?l  +  _^j)^(X)]  + 

C3\lPl  X  0(^)  -  02)  +  {P1-  ^2)^^]  + 

Cl|2P2X0(ta)[^(^-l]  + 
C3|2;?2X0(^)[-^(r-i]  + 

Cl|3;?3X0(te)[^^jl]  + 

C2|3;^3X0(%f)[^cr3-i]- 

C2|3;?3X<^(te)[^(^-l] 

C2|i;?l  X  [(^-1^  _  (02  _^j)cr-2] - 

C2\iPi  X  <^(^)  [^^7  -  (01  - Pi)(r~i^]  + 

C3\lPl  X  0(^)  -  (/^l  -  ^2)(r-2]  + 

Cl|2;?2X0(ta)[^(^-l]  + 
C3|2;?2X0(^)[-^(r-i]  + 

Cl|3;?3X0(te)[^^-l]  + 

C2|3;^3X0(%f)[^cr-i]- 

C2|3;i3X<^(te)[^(^-l] 

'  crrWiX0(ta)[^_(ta)]_  ■ 
cr7WiX0(to)[gL_(ta)]  + 
cr7ic3|i;iiX^(^-^)[^-(^-i^)]  + 

c^i^ci|2;?2X0(ta)[^]  + 
cr2^C3|2;?2X<^(^)[-^]  + 

c^;^ci|3;i3X0(te)[^]  + 
(^3^C213P3X0[^)[^]- 
(^3^C2\3P3X4>{^)[^] 


and  continuing  to  simplify: 
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ggc\ 
do-\  ) 


crrWiX0(ta)[^_(ta)]_ 

cr2^ci|2;?2X0(ta)[^]  + 

cr2^C3|2;?2X<^(^)[-^]  + 

CT3^Ci|3;?3X0(te)[^]  + 

C^3^C2|3;^3X0(te)[^]_ 

C^3'c2|3;?3X0(te)[^] 

'  ^211/^10  (ta)  [^  -  (ta)]  -  C2|i;.30  (ta)  [^  _  (ta)]  +  1  ■ 

(^2^  [ci|2;^20(^)  [^]  +  C^\2P2(t>  (^)  [-^]]  + 

cr;l[ci|3;?3<^(^)[^]+ C2|3;^30(^)[^]-C2|3;^30(^)[^]]  . 

[p(^){c2\lP\  -  C3\m)  -  (ta)]  _  C2\ipi<p(^)  -  (^)]]  + 

^^2^  [ci\2P2(p(^)  [^]  -  C3|2;^20(^)  [^]]  + 

^3  ^  W^)  [£]  (""113^3  -  C2|3;?3)  +  C2\2P2(I)(^)  [^]] 
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CO  |cT:i 


U.i;.!  X  (cD  (to)  -  O  (ta))  +  X  (O  {B^))  +  ■ 

=  ^  C^2P2x{Q>{^-^))  +  C312P2x{0{^))  + 

.  Wx(cl>(te))  +  ,,l3^3x(cl>(te)_cD(te)) 

C2\lPl  X  afj  [o  (^)]  -  C211P1  X  [3>  (^)]  +  C3|i;?l  X  ^  [o  (^)]  + 
^112/^2  X  ^  [cD  (to)]  +  ,313^3  X  ^  [cD  + 

Cl\3P3  X  [O  (te)]  +  c2|3;73  X  [o  {^)]  -  C2\3P3  X  gfj  [<1>  (^)] 


C2|i;?i  x0( 

da-2  [ 

(V)] 

C2|i;?i  x0( 

da2  [ 

+ 

C3ii;?i 

da-2  [ 

('';,"■)] 

+  C1I2P2  X0(  ) 

da2  [ 

(‘'‘;f). 

+ 

C3I2P2 

da-2  [ 

("■"■)] 

+  C1I3P3  X0(  ) 

da2  [ 

+ 

C2|3;?3  X  0  ( 

)  da-2 

r;f: 

)]  C2|3P3X0(‘^f 

)  da2 

[(V 

)] . 

C2|i;?l  X0(ta)[^(^-1]_ 

C2|i;?l  x0(ta)[g(^-i]  + 

C3|i;^i  x0(^)[-gcr-i]  + 

Cl|2;?2  X  0  (ta)  ^(01  -  ^2)  +  (01  -  P2)£-^  (^)]  + 
C3|2;?2  X  0(^)  -  O2)  +  {P2-  02)£-^  (^)]  + 

Cl|3;?3X0(te)[g(^-l]  + 
C2|3;?3X0(te)[^C^-l]_ 
C2|3;^3X0(te)[^(^-l] 

C2|i;?i  X0(ta)[^c^-1]_ 

C2|i;?i  X0(ta)[^c^-1]  + 

C3|i;^iX0(^)[-g(r-i]  + 

Cl|2;?2  X  (te)  [(^-1  ^  _  ^2)(^-2]  + 

C3|2;?2  X  0  (^)  [£^2 '  ^  -  (^2  -  02)cri^]  + 

W3X0(te)[^C^-l]  + 
C2|3;^3X0(te)[^C^-l]_ 
C2|3;?3X0(te)[g(^-l] 

and  continuing  to  simplify: 


144 


dBC\ 
do-2  ) 


crJ^C3liPiXcp(f^)l-§^]  + 

C-2W2X0(ta)[g_(ta)]  + 
cr-W2X0(^-^)[^-(^-^)]  + 

(^3^Cii3P3X<p(^)[^]  + 

C^3^C2|3;^3X0(te)[g]_ 

C^3'c2|3;?3X0(te)[g] 

ct^^C2|i;?i  X0(ta)[g]_ 

CT-^C2|i;?1  X0(ta)[gL]  + 
cr7ic3|i;?iX<^(^)[-^]  + 

CT2^Ci|2;?2  X  0(^)  [g  -  (^)]  + 

cr-W2X0(^-^)[^-(^-g^)]  + 

CT3^Ci|3;?3X0(te)[^]  + 

C^3^C2|3;^3X0(te)[^]_ 

C^3^C2|3;?3X0(te)[g] 

cr^i  [c2\iPi<p(^)  [^]  -  C2|i;?i0(ta)  +  C3\ipicl,(f^)  [-g]]  + 

hl2;^2^(to)  [^  -  (te)]  +  C3,2P2C^(^)  [^  -  (^-^)]]  + 

CT-l  [ci\3P3<l>  (^)  [^]  +  C2|3;^30  (%f )  [^]  -  C2|3;^30  (^)  [^]] 

W^)  [^]  (^2|i;?l  -  C3|i;?l)  -  C2\lPl(p{^-^)  [g]]  + 

hl2;^2^(^)  [^  -  (^)]  +  C3,2P2<P(^)  [^  -  (^-^)]]  + 

^3 ^  W^)  [£]  (^113^3  -  C2|3;?3)  +  C2\3P3(^(^)  [g]] 
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dBC 

do-3 


d 

do-3 


C2|,P,  X  (<l.(fcf )  -  t  (^))  +  C3|,p,  X  (0(1^))  + 

C112P2  X  (<t  (^))  +  C312P2  X  (id  (<^))  + 

C,|3P3  X  (<!•  +  <^2|3P3  X  (<t  (%P)  -  <!•  (fcf )) 

C2I1P1  X  gi;  [®(2ga)]  -C2|,p,  X  gf;  [(I,(te)]  +q„3,  X  gfj  [©(‘H;*)]  + 
.,|2P2  X  gf;  [$(te)]  X  g^ 


C1I3P3X 


_U  n-i  V 


,-3  r  /  n 


C2|i;?i  x0( 

da-3  [ 

(vDl 

C2|lPl  X0( 

Sa3  [ 

m 

1  + 

C3ii;?i 

da3  [ 

('';,"■)] 

+  C1|2P2X0( 

da3  [ 

r;r). 

1  + 

C3I2P2 

^r, 

da3  [ 

("■"■)] 

+  C1I3P3  X0(  ) 

^r, 

da3  [ 

1  + 

C2|3;?3  X  0  ( 

)  da3 

r;,": 

)]  C2|3P3X0(‘^f 

)  da3 

[(V, 

)] . 

C2|i;?i  x0(ta)[gc^-i]  + 

C3|i;^i  x0(^)[-^(r-i]  + 
Cl|2;?2X0(ta)[g(^-l]  + 
C3|2;?2X0(^)[-^cr-i]  + 

Cl|3;^3  X  0  (^)  ^(01  -  w)  +  (01  -  (^)]  + 

C2|3;?3  X  0  (^)  [^5^7(02  -  A'3)  +  (02  -  W) gf;  (^)]  - 

_  C2|3;?3X0(te)[X^(0j  _^3)  +  (0j 

C2|i;?i  X0(ta)[^(^-1]_ 

C2|i;?i  X0(ta)[g(^-1]  + 

C3|i;^i  x0(^)[-g(r-i]  + 
Cl|2;?2X0(ta)[g(^-l]  + 
C3|2;?2X0(^)[-^(r-l]  + 

Cl|3;^3  X  0  (^)  [c^3  '  ^  -  (01  -  /^3)c^;^]  + 

C2|3;^3  X  0  (^)  [c^3  ^  ^  -  (02  -  //3)(^;^]  - 

C2|3;?3  X  0  (te)  [(^-1  ^  _  ^3)c^-2] 

and  continuing  to  simplify: 
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^  I  S’ 


cr^icsii;?!  x0(^)[-g]  + 

(r~2^Cl\2P2^(l>(^)[^]  + 

Cr2'c3|2;?2X0(^)[-g]  + 

(r-^^ci\2P2  X  _  (te)]  + 

^3W3X^(te)[g_(te)]_ 
.  '-3W3X0(te)[g_(te)] 

(r-ic2|i;?ix<^(ta)[^]_ 
cr-ic2|i;?iX<^(ta)[gL]  + 

(r-ic3|i;?i  x0(^)[-g]  + 

(^2^Ci|2;?2X<^(ta)[^]  + 
Cr2^C3|2;?2X0(^)[-g]  + 


(T^^Ci\2P2X(p( 

1 

—  1 

^  1^0 

cr2^C2\2P2^(l>( 

_  / 
[50-3  ' 

(r^^C2\2P2^(p( 

1[^- 

'  L<5a-3 

(^)] 

W^)  [^]  (^2|i;?i  -  C3|i;?i)  -  C2|i;?i<^(ta)  [g]]  + 

[ci|2;^20  (^)  [^]  +  C3|2;^2<^(^)  [-Si]]  + 

^3“^  W^)  [£  -  (^)]  (''113^3  -  C2|3;^3)  +  C2|3;^30  (^)  -  (^)]] 
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A.3  Partial  derivatives  of  four-class  Bayes  Cost  with  respect  to  all  distributional  parameters. 


+  0-2  ki|2  X 


9:-^i2\d9\ 


<T2  ] 


9\-^i2\d9\  i9;-^i2\d9l 


0-2  ]  dm 


0-2  ]  dm 


0-2  )  dm  \_ 


I  .  {9\-m\d9\  {9i-m\d9*  {9\-m\d9\  ^^3 


I  II  “i  -w "1  SB’  {s’2-ba  \  se: 

+  3r-  |.,„x|0|— — J— — J  — 


or -PA )  so- 


a-A  )  dli\ 


ST 


IX.-Sa\SK  (ft-^iASB’. 


CTA  )  S/li 


a-A  j  dm 


(A.4) 


laBC\  1  {Bf-m^soi  Sd'^  fej -,/i -i  aej  i  sa^  sb’ 


I  dti2  I 


<^1 

+  0-2*  |ci|2 


dm  \  I 

9\-m\\d9\ 
0-2 


^.2  .2 


dm\  I 

<^2  jvdm  JJ  '  LI  ‘^2 


dm 


dm\ 


{9\-m\d9\  I9l-m\d9l  l9\-m\d9\ 


0-3  j  9/32  J  ^  LI,  0'3 


sl-S2\  SOI 

Sn 


,  ^0\^OA\SB•^  l0‘^-tiA\S0‘2  (BI-iaa\SBI  (Bl-m\SBl  f  flj  - /34  "i 

"-4  I-.I4  X  KI  —  J  +^2|4  x[^[— J  ^  — J  443,4  X  J  -  J  ^ 


(A.5) 


SBC\  I  1^1-01]  SB’  to’-iiiYSB’.  Ce: -//,  uae:  (St-oi)  se’\  C/ji-fl*\|  90:1 

'=-!■  .2IIX  0|^|  +43„X  ^  -.^1^1  ^  444,|X  0|^||(-1)3^| 


dm 


I 

9^.-m\d9] 


dm  \ 


dm\ 

9X-m\d9;  (9*-m\d9* 


dm 


dm  \ 


[  /  /i*  -  03  '\  dd* 

|4,|2  X  1^1  +.312  X  ^  . 


+  [41I4  X  +.2I4  x[.^( 


I  Ot- BA  \  SO. 


a-A  )  9/33 


O’.  -BA]  SB’ 


b:-ba]SB’  (0’-ba]SB’ 
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A.4  Wald  and  Log  Wald  Cl  for  Bayes  Cost 

The  Wald  method  for  constructing  CIs  is  common  and  easily  applied  for  large  sample  sizes, 
though  may  not  perform  as  well  as  other  methods.  Developed  with  the  large  sample  normality  of 
maximum  likelihood  estimators  (MLEs),  the  statistic 


z  - 


9-00 


(A.  12) 


is  approximately  standard  normal  when  6  =  9o  [3,  p.  11], [70].  The  (1  -  cr)100%  Wald  Cl  is  then 
found  as 

?±Zi.  xS^  (A.13) 


Although  the  Wald  Cl  is  easy  to  implement,  it  performs  poorly  for  binomial  probabilities  with 
respect  to  coverage  [2].  Despite  this,  the  Wald  Cl  is  considered  for  BC  as  it  is  easily  computed  and 
a  good  place  to  start  for  baseline  comparison  of  newly  developed  methods,  and  may  perform  better 
for  the  sum  of  binomial/multinomial  probabilities  (i.e.  BC)  rather  than  for  binomial  probabilities 
directly. 

A.4.1  Bayes  Cost  for  two-class  classification  system. 

Consider  a  two-class  classification  system  with  results  tabulated  in  a  contingency  table  as  in 
Table  2.3.  Class  one  has  ni  =  Xi|i  -1-X211  observations  and  class  two  has  02  =  Ai|2  +2f2|2  observations 
(with  ni  and  02  fixed).  The  outcomes  from  each  class  are  mutually  exclusive  and  independently 
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distributed,  and  for  each  observation  in  a  class,  the  classification  system  labels  each  observation  as 
only  one  of  the  two  possible  outcomes.  No  distributional  assumptions  on  the  feature  or  features  used 
for  classification  are  made.  In  [76],  J  is  defined  as  fhe  maximum  of  fhe  sum  of  correct  classification 
rates  minus  one,  which  can  be  written  as 


J  =  max 

See 


^1|1  ^2|2 

— !-  +  — ! - 1 

rii  n2 


(A.  14) 


where  Xip  and  X2\2  are  the  random  variables  representing  the  number  of  observations  correctly 
classified  for  a  vector  of  thresholds  0  €  0  .  Bayes  Cost,  which  is  defined  to  minimize  the 
misclassification  rates  instead  of  maximizing  the  correct  classification  rates,  can  be  used  similarly. 
In  the  nonparametric  framework,  BC  (with  equal  cost  and  prevalence  multipliers,  assumed  to  equal 
one)  may  be  written 


BC  =  min 
6160 


111  n2 


(A.  15) 


where  X211  and  X\\2  are  the  random  variables  representing  the  misclassified  observations  for  a  0  €  0  . 
The  expected  value  and  variance  of  BC  defined  in  Equation  A.  15  is  determined  using  properties  of 
the  binomial  distribution. 


E(BC)  =  E(i2i  +  iE) 

\ni  n2  j 

=  -E{X2\i)  +  -E{Xi\2) 

111  «2 

_  ^2|l(^)  X  ^  P\\2{0)  X  n2 
ni  n2 

=  P2\m  +  Pi\2{e)  (A.16) 


Var{BC)  =  Varl^  + 

\  til  «2  / 


=  \var{X2ii)  +  \var{Xn2) 

^1  ^2 

_  ^2|l(^)  X  Pl|l(0)  X  til  Pl\2{0)  X  P2\2{0)  X  n2 

n2 

_  P2|l(g)xPi|i(g)  ^  Pl|2(g)  X  P2|2(g) 

til  n2 


(A.  17) 
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where  Pr|/0)  is  the  true  probability  of  classifying  class  j  as  class  i  for  a  given  6  e  Q  and  tij  is  the 
total  number  of  observations  sampled  from  the  class.  Using  the  MLEs  for  Pi\j{9)  (from  the 
binomial  distribution  presented  in  Section  2.7.1),  BC  and  the  variance  of  BC  are  estimated 


ni 


ni 


Var-m  =  ^ 


(A.  18) 
(A.  19) 


(ni)3  (n2)^ 

For  greater  utility,  BC  may  be  defined  with  prevalences  on  classes  and  different  costs  on 
misclassification  errors  [58,  65]  such  that 


BC  =  min 

9e& 


X2\l  Xi\2 

C2\lPl - 1-  Cl\2P2 - 

n[  n2 


(A.20) 


where  Ci\j  is  the  fixed  cosf  associated  with  misclassifying  class  j  as  class  i  and  pj  is  the  fixed 
prevalence  for  the  class.  The  expected  value  and  variance  of  BC  defined  in  Equafion  A.20  is 


E[BC]  =  P\C2\\P2\\{9)  +  P2C\\2P\\2{9) 


T/  .  ,2^2|i(0)xEi|i(0)  ^  ,2 ^1|2W  ><  ^2|2(0) 

Var[BC^  =  {piC2\ir— - - +  (P2C112)  — 


(A.21) 

(A.22) 


n\  n2 

Once  again,  using  the  MEEs  for  the  binomial  proportions  Pi\j{6) ,  BC  and  the  variance  of  BC  are 


BC  =  piC2|i^  +  P2C1I2—  (A.23) 

ni  n2 

Var{BC)  ^  {p\C2\\)  - ^ —  +  (P2C112)  - ^ —  (A.24) 

A.4.2  Bayes  Cost  for  a  k-class  classification  system. 

Consider  a  classification  system  with  three  or  more  classes  where  the  diagnostic  outcomes  may 
be  tabulated  in  a  contingency  table  as  in  Table  2.4,  for  a  given  0  €  0  .  Once  again,  no  distributional 
assumptions  on  the  feature  or  features  used  for  classification  are  made.  For  the  three-class  example, 
the  first  class  has  n\  -  xi\\  +  X2\i  +  X311  observations,  the  second  class  has  02  =  xip  +  X212  +  2C312, 
and  the  third  class  has  n^,  =  xip  -1-  X213  -1-  X313  observations  (where  n\,n2,  and  nj,  are  all  assumed 
fixed).  The  outcomes  from  the  classes  are  mutually  exclusive  with  independent  distributions  and 
the  classification  system  labels  each  observation  as  only  one  of  the  three  (or  k  for  k  classes)  possible 
outcomes.  Therefore,  the  number  of  outcomes  in  each  diagnostic  state  in  a  single  class  (or  column 
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in  the  contingency  table)  are  distributed  multinomial  (see  Section  2.7.2).  Similar  to  the  two-class 
classification  system,  BC  is  defined  with  costs  and  class  prevalence  multipliers: 


BC  -  min 
ee® 


i=l  j=\ 


% 

rii 


(A.25) 


The  expected  value  and  variance  of  BC  is  determined  directly  from  the  properties  of  the 
multinomial  distribution,  taking  into  account  the  covariances  between  outcomes  within  the  same 
class.  Therefore, 


3  3 


E{BC)  = 


(A.26) 


(=1  i=\ 
t*j 


and 


VariBC)  -  ^ 
,/=i 


l=l  ''  J 

^  >*J 
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2  3 


2  i=l 

i*j 


(A.27) 


The  MLEs  for  the  multinomial  distribution  are  used  to  estimate  BC  and  the  variance  of  BC  as 
follows 


3  3 


and 


VariBC)  =  ^ 

3=1 


1=1.  3=1 


3  I,-  1  2pi 
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Equation  A.25  can  be  generalized  for  k  classes  as  [58] 


BC  -  min 
0e& 


k  k  ^ 

V  V 


1=1.  3=1 
L  15^.7 


(A.28) 


(A.29) 


(A.30) 


Eurther,  BC  and  Var{BC)  for  any  k-class  BC  is  found  similar  to  Equations  A.28  and  A.29  using  the 
mean,  variance,  and  covariance  of  multinomial  random  variables.  Although  an  equivalence  between 
the  optimal  threshold  for  the  two-class  BC  and  the  GYI  optimal  threshold  exists  (see  Theorem  3, 
Section  2.5.2),  for  k  >  3  classes  this  equivalence  of  optimal  thresholds  does  not  universally  hold, 
specifically  when  the  costs  of  misclassification  within  a  single  class  or  between  classes  are  not 
equal  [58]. 
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A.4.3  Wald  and  Log  Wald  Confidence  Intervals. 

A  (1  -  a)  100%  Wald  Cl  for  the  ^-class  BC  (^  =  2, 3, . . . )  is 

BC  ±Z^i_q)yjvar{BC)  (A.31) 

where  the  BC  and  Var(BC)  are  found  nonparametrically  as  in  Sections  A.4.1  and  A.4.2.  Since  BC 
is  bounded  above  zero,  a  Cl  around  the  natural  logarithm  of  BC  is  also  considered  in  order  to  assure 
that  the  Cl  greater  than  zero  [41,  p.l63].  The  (1  -  a)100%  Wald  Cl  around  the  log  of  BC  is 

log(BC)  ±  Za/2  X  yar(log(BC))  (A.32) 

Then  the  (1  -  a)  100%  log  Wald  Cl  for  BC  is 

BC  X  exp  [±Zaii  X  Var{log{BC))]  (A.33) 

where  the  delta  method  is  used  to  approximate  Var(\og{BC))  as 

yar(log(BC))  w  r  Var{BC)  =  -^Var{BC)  (A.34) 

I  dBC  )  BC 
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A.5  Regularity  Conditions 

Regularity  conditions  required  for  Theorem  8  are  given  in  [12,  p.516],  listed  below.  These 
conditions  are  assumed  for  the  normal  and  multinomial  distributions,  which  are  exponential  family 
distributions. 


(Al)  We  observe  Xi,. .  .,X„  where  X,-  ~  f{x  \  6)  are  iid. 

(A2)  The  parameter  is  identifiable',  that  is,  if  6  9'  ,  then  f{x  \  6)  +  fix  \  0')- 

(A3)  The  densities  fix  \  9)  have  common  support,  and  fix  \  9)  is  differentiable  in  9  . 
(A4)  The  parameter  space  O  contains  an  open  set  tu  of  which  the  true  parameter  value 
9{)  is  an  interior  point. 

(A5)  For  every  x  e  X  ,  the  density  fix  \  9)  is  three  times  differentiable  with  respect  to 
9  ,  the  third  derivative  is  continuous  in  9  ,  and  J  fix  \  9)dx  can  be  differentiated  three 
times  under  the  integral  sign. 

(A6)  For  any  9()  e  Q.  ,  there  exists  a  positive  number  c  and  a  function  Mix)  (both 

o3 

of  which  may  depend  on  9f}  such  that  ^  log  fix  \  9)  <  Mix)  for  all  x  e  X  , 
9o  -  c  <  9  <  9o  +  c  ,  with  [M(X)]  <  ex?  . 
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Appendix  B:  Additional  Tables 
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B.l  Parametric  Confidence  Interval  Simulation  Tables 


Table  B.l:  Coverage  probability  and  length  for  parametrie  95%  Cls  around  BC  under  equal  costs 
and  three  classes  with  a  normally  distributed  feature. 


BC3 

Delta 

GCl 

BCa 

BP 

AN 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Normal 

10 

1.23 

92.60 

0.70 

96.13 

0.67 

88.20 

0.63 

90.26 

0.68 

83.82 

0.68 

0-3  ^  1 

0.91 

91.98 

0.69 

96.03 

0.68 

89.78 

0.63 

89.36 

0.66 

85.54 

0.66 

0.63 

91.78 

0.65 

96.07 

0.65 

89.88 

0.58 

87.52 

0.58 

85.16 

0.58 

0.42 

91.02 

0.57 

95.83 

0.60 

89.42 

0.51 

85.24 

0.48 

84.66 

0.47 

0.27 

90.04 

0.46 

95.47 

0.52 

89.36 

0.41 

83.12 

0.37 

84.98 

0.37 

50 

1.23 

94.80 

0.32 

95.63 

0.32 

94.48 

0.32 

94.42 

0.32 

93.24 

0.32 

0.91 

94.78 

0.32 

95.70 

0.32 

94.46 

0.31 

94.28 

0.31 

93.52 

0.31 

0.63 

94.44 

0.30 

95.77 

0.30 

94.14 

0.29 

93.64 

0.29 

93.00 

0.29 

0.42 

94.02 

0.26 

95.43 

0.26 

94.12 

0.26 

93.08 

0.25 

93.20 

0.25 

0.27 

94.50 

0.21 

95.50 

0.22 

94.00 

0.21 

92.58 

0.20 

93.20 

0.20 

100 

1.23 

94.56 

0.23 

95.47 

0.23 

94.30 

0.23 

94.44 

0.23 

93.48 

0.23 

0.91 

94.48 

0.23 

94.97 

0.22 

94.46 

0.23 

94.26 

0.22 

93.60 

0.22 

0.63 

95.12 

0.21 

94.93 

0.21 

94.22 

0.21 

93.84 

0.21 

93.80 

0.21 

0.42 

94.72 

0.18 

94.73 

0.19 

94.12 

0.18 

93.66 

0.18 

93.88 

0.18 

0.27 

94.70 

0.15 

94.57 

0.15 

94.12 

0.15 

93.74 

0.15 

93.60 

0.15 

250 

1.23 

95.04 

0.14 

95.20 

0.14 

94.98 

0.14 

94.86 

0.14 

94.90 

0.14 

0.91 

94.82 

0.14 

95.30 

0.14 

94.86 

0.14 

94.58 

0.14 

94.88 

0.14 

0.63 

94.76 

0.13 

95.03 

0.13 

94.72 

0.13 

94.16 

0.13 

94.42 

0.13 

0.42 

94.70 

0.12 

94.77 

0.12 

94.54 

0.12 

94.24 

0.12 

94.62 

0.12 

0.27 

94.66 

0.09 

94.80 

0.10 

94.36 

0.10 

94.24 

0.09 

94.52 

0.09 

Normal 

10 

1.23 

92.44 

0.76 

96.33 

0.73 

90.24 

0.72 

89.86 

0.74 

87.48 

0.74 

0-3-2 

0.91 

91.66 

0.74 

96.57 

0.73 

91.44 

0.70 

89.52 

0.70 

86.74 

0.70 

0.63 

91.38 

0.67 

96.30 

0.69 

91.02 

0.63 

87.78 

0.61 

85.88 

0.60 

0.42 

90.74 

0.58 

96.27 

0.62 

90.08 

0.53 

85.96 

0.50 

85.10 

0.49 

0.27 

90.20 

0.47 

96.30 

0.52 

89.38 

0.43 

83.74 

0.38 

85.18 

0.37 

50 

1.23 

94.54 

0.35 

95.43 

0.35 

94.74 

0.35 

94.38 

0.35 

93.80 

0.35 

0.91 

94.36 

0.34 

95.73 

0.34 

94.40 

0.34 

94.22 

0.34 

93.54 

0.34 

0.63 

94.04 

0.31 

95.73 

0.31 

94.26 

0.31 

93.82 

0.30 

93.62 

0.30 

0.42 

93.78 

0.27 

95.83 

0.27 

94.14 

0.26 

93.44 

0.26 

93.12 

0.26 

0.27 

93.40 

0.21 

95.57 

0.22 

93.92 

0.21 

92.90 

0.20 

93.34 

0.20 

100 

1.23 

95.08 

0.25 

95.07 

0.25 

94.30 

0.25 

94.18 

0.25 

94.18 

0.25 

0.91 

95.34 

0.24 

95.20 

0.24 

94.38 

0.24 

94.24 

0.24 

94.02 

0.24 

0.63 

95.02 

0.22 

95.10 

0.22 

94.40 

0.22 

94.04 

0.22 

94.26 

0.22 

0.42 

94.86 

0.19 

95.20 

0.19 

94.30 

0.19 

94.02 

0.19 

94.10 

0.19 

0.27 

94.74 

0.15 

94.90 

0.15 

94.26 

0.15 

93.60 

0.15 

94.20 

0.15 

Continued  on  next  page 
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Table  B,1  -  continued  from  previous  page 


BC3 

Delta 

GCI 

BCa 

BP 

AN 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

250 

1.23 

95.12 

0.16 

95.17 

0.16 

94.94 

0.16 

94.88 

0.16 

94.86 

0.16 

0.91 

95.02 

0.15 

95.17 

0.15 

94.84 

0.15 

94.66 

0.15 

94.86 

0.15 

0.63 

94.68 

0.14 

95.07 

0.14 

94.78 

0.14 

94.72 

0.14 

94.68 

0.14 

0.42 

94.90 

0.12 

95.13 

0.12 

94.52 

0.12 

94.28 

0.12 

94.58 

0.15 

0.27 

94.68 

0.10 

95.07 

0.10 

94.46 

0.10 

94.52 

0.09 

94.60 

0.10 

Normal 

10 

1.23 

92.28 

0.77 

96.43 

0.75 

91.84 

0.75 

90.40 

0.76 

89.28 

0.76 

0-3=4 

0.91 

92.02 

0.74 

96.77 

0.75 

93.00 

0.73 

90.50 

0.72 

87.78 

0.72 

0.63 

91.56 

0.67 

96.90 

0.70 

92.26 

0.65 

88.94 

0.62 

86.68 

0.62 

0.42 

90.98 

0.58 

96.70 

0.62 

91.36 

0.55 

86.96 

0.50 

85.60 

0.50 

0.27 

90.50 

0.47 

96.80 

0.52 

89.88 

0.44 

84.56 

0.38 

85.62 

0.38 

50 

1.23 

94.36 

0.36 

95.70 

0.35 

94.78 

0.36 

94.22 

0.35 

94.00 

0.36 

0.91 

94.12 

0.34 

95.53 

0.34 

94.86 

0.34 

94.52 

0.34 

93.90 

0.34 

0.63 

93.98 

0.31 

95.77 

0.31 

94.62 

0.31 

94.06 

0.30 

93.64 

0.31 

0.42 

93.60 

0.27 

95.90 

0.27 

94.24 

0.27 

93.50 

0.26 

93.28 

0.26 

0.27 

93.26 

0.21 

95.73 

0.22 

93.88 

0.21 

92.82 

0.20 

93.32 

0.20 

100 

1.23 

95.28 

0.25 

95.10 

0.25 

94.30 

0.25 

94.16 

0.25 

93.98 

0.25 

0.91 

95.04 

0.24 

95.17 

0.24 

94.26 

0.24 

94.40 

0.24 

94.34 

0.24 

0.63 

94.86 

0.22 

95.10 

0.22 

94.40 

0.22 

94.20 

0.22 

94.14 

0.22 

0.42 

94.62 

0.19 

95.13 

0.19 

94.26 

0.19 

94.02 

0.19 

94.38 

0.19 

0.27 

94.62 

0.15 

95.30 

0.15 

94.04 

0.15 

93.90 

0.15 

94.16 

0.15 

250 

1.23 

95.14 

0.16 

94.87 

0.16 

94.72 

0.16 

94.78 

0.16 

94.96 

0.16 

0.91 

95.06 

0.15 

95.07 

0.15 

94.90 

0.15 

94.86 

0.15 

94.98 

0.15 

0.63 

94.90 

0.14 

94.97 

0.14 

94.86 

0.14 

94.70 

0.14 

94.72 

0.14 

0.42 

94.76 

0.12 

94.93 

0.12 

94.80 

0.12 

94.58 

0.12 

94.90 

0.12 

0.27 

94.68 

0.10 

95.13 

0.10 

94.74 

0.10 

94.74 

0.09 

94.82 

0.10 

GCI  -  generalized  confidence  interval;  BCa  -  bias  corrected  and  accelerated;  BP  -  basic  percentile; 
AN  -  asymptotic  normal;  Cov  -  coverage;  Len  -  length 
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Table  B.2:  Coverage  probability  and  length  for  95%  parametrie  CIs  around  BC  under  equal  eosts 
and  three  elasses  with  a  non-normally  distributed  feature. 


Delta  GCI  BCa  BP  AN 


BC^ 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Gamma 

10 

1.23 

88.68 

0.66 

92.20 

0.64 

89.96 

0.73 

90.20 

0.72 

79.44 

0.72 

0.91 

88.94 

0.82 

95.83 

0.68 

90.08 

0.67 

88.50 

0.65 

78.40 

0.65 

0.63 

89.10 

0.61 

96.30 

0.61 

90.84 

0.66 

88.92 

0.61 

83.78 

0.61 

0.42 

91.30 

0.54 

97.47 

0.57 

89.08 

0.45 

84.48 

0.44 

80.26 

0.43 

0.27 

94.22 

0.46 

98.60 

0.49 

89.38 

0.36 

85.74 

0.34 

86.64 

0.33 

50 

1.23 

84.90 

0.30 

85.50 

0.30 

89.90 

0.34 

89.66 

0.34 

85.32 

0.34 

0.91 

84.60 

0.31 

87.67 

0.31 

87.38 

0.31 

86.08 

0.31 

79.34 

0.31 

0.63 

90.32 

0.28 

91.93 

0.28 

92.58 

0.30 

90.90 

0.30 

89.96 

0.30 

0.42 

93.98 

0.25 

95.93 

0.25 

91.80 

0.23 

90.50 

0.22 

88.56 

0.22 

0.27 

96.92 

0.21 

96.60 

0.21 

90.96 

0.18 

90.84 

0.17 

94.10 

0.17 

100 

1.23 

81.26 

0.21 

81.73 

0.21 

88.64 

0.25 

87.94 

0.25 

83.36 

0.25 

0.91 

78.76 

0.22 

79.93 

0.22 

82.38 

0.22 

80.54 

0.22 

73.86 

0.22 

0.63 

89.86 

0.20 

90.80 

0.20 

93.18 

0.22 

91.74 

0.22 

90.76 

0.22 

0.42 

94.42 

0.18 

94.10 

0.18 

92.90 

0.17 

92.38 

0.17 

90.68 

0.17 

0.27 

96.12 

0.15 

94.70 

0.15 

89.18 

0.13 

90.72 

0.13 

94.04 

0.13 

250 

1.23 

74.04 

0.13 

72.63 

0.13 

83.92 

0.16 

82.98 

0.16 

78.64 

0.16 

0.91 

63.38 

0.14 

62.83 

0.14 

70.50 

0.14 

67.62 

0.14 

62.14 

0.14 

0.63 

88.92 

0.12 

89.17 

0.12 

92.76 

0.14 

91.84 

0.14 

90.96 

0.14 

0.42 

94.32 

0.11 

95.07 

0.11 

93.70 

0.11 

93.14 

0.11 

91.78 

0.11 

0.27 

93.38 

0.09 

90.87 

0.09 

85.62 

0.08 

87.24 

0.08 

91.20 

0.08 

Gamma  w/ 

10 

1.23 

92.36 

0.69 

95.37 

0.65 

91.68 

0.70 

92.94 

0.70 

84.58 

0.71 

Box-Cox 

0.91 

91.52 

0.69 

95.67 

0.67 

90.68 

0.66 

89.66 

0.68 

85.50 

0.67 

0.63 

89.80 

0.62 

94.43 

0.60 

90.64 

0.61 

86.90 

0.60 

87.04 

0.60 

0.42 

91.58 

0.57 

94.03 

0.59 

89.40 

0.50 

85.56 

0.48 

85.78 

0.48 

0.27 

90.70 

0.48 

92.83 

0.52 

89.38 

0.44 

84.16 

0.40 

86.14 

0.39 

50 

1.23 

94.32 

0.32 

95.43 

0.40 

94.20 

0.31 

94.24 

0.31 

91.72 

0.31 

0.91 

94.16 

0.32 

95.03 

0.41 

94.14 

0.32 

93.60 

0.32 

93.32 

0.32 

0.63 

92.98 

0.29 

94.00 

0.37 

94.02 

0.30 

93.54 

0.30 

94.14 

0.30 

0.42 

94.52 

0.26 

94.27 

0.34 

92.66 

0.25 

92.08 

0.25 

93.18 

0.25 

0.27 

92.96 

0.22 

91.53 

0.29 

90.84 

0.23 

91.32 

0.22 

93.66 

0.22 

100 

1.23 

94.22 

0.22 

94.43 

0.31 

94.56 

0.22 

94.44 

0.22 

92.74 

0.22 

0.91 

93.86 

0.23 

94.53 

0.32 

94.14 

0.23 

93.96 

0.23 

93.76 

0.23 

0.63 

92.28 

0.20 

92.47 

0.29 

93.18 

0.22 

93.20 

0.22 

94.38 

0.22 

0.42 

94.82 

0.18 

94.60 

0.26 

93.22 

0.18 

93.28 

0.18 

94.48 

0.18 

0.27 

91.74 

0.16 

90.87 

0.22 

89.20 

0.16 

91.34 

0.16 

93.42 

0.16 

250 

1.23 

93.92 

0.14 

94.40 

0.22 

94.18 

0.14 

94.30 

0.14 

92.98 

0.14 

Continued  on  next  page 
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Table  B.2  -  continued  from  previous  page 


Delta 

GCI 

BCa 

BP 

AN 

BC^ 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

0.91 

94.36 

0.14 

95.23 

0.23 

94.86 

0.15 

94.78 

0.15 

94.78 

0.15 

0.63 

92.40 

0.13 

92.80 

0.20 

93.80 

0.14 

94.10 

0.14 

94.74 

0.14 

0.42 

94.58 

0.12 

94.07 

0.18 

93.62 

0.12 

93.66 

0.12 

94.56 

0.12 

0.27 

85.76 

0.10 

88.13 

0.16 

82.72 

0.10 

85.56 

0.10 

88.52 

0.10 

Normal 

10 

1.23 

87.84 

2.39 

93.20 

0.72 

86.60 

0.68 

87.28 

0.70 

79.32 

0.70 

Mixture 

0.91 

90.14 

0.75 

95.23 

0.75 

91.08 

0.73 

88.64 

0.73 

85.58 

0.73 

0.63 

87.54 

0.67 

94.63 

0.69 

89.90 

0.66 

85.40 

0.62 

81.64 

0.62 

0.42 

87.94 

0.57 

94.80 

0.61 

89.26 

0.56 

83.78 

0.52 

83.92 

0.51 

0.27 

84.08 

0.44 

93.67 

0.49 

89.36 

0.47 

81.78 

0.43 

83.04 

0.41 

50 

1.23 

79.42 

0.35 

82.73 

0.35 

82.56 

0.35 

81.42 

0.34 

76.36 

0.35 

0.91 

94.16 

0.35 

95.07 

0.35 

94.22 

0.35 

93.90 

0.35 

93.52 

0.35 

0.63 

88.54 

0.31 

91.50 

0.31 

92.72 

0.32 

90.62 

0.32 

87.38 

0.32 

0.42 

91.12 

0.26 

93.60 

0.27 

94.14 

0.28 

92.28 

0.27 

91.80 

0.27 

0.27 

88.88 

0.21 

91.30 

0.21 

94.20 

0.24 

91.94 

0.23 

92.28 

0.23 

100 

1.23 

67.10 

0.25 

68.17 

0.25 

70.50 

0.25 

69.44 

0.25 

64.82 

0.25 

0.91 

93.90 

0.25 

94.40 

0.25 

94.78 

0.25 

94.64 

0.25 

94.76 

0.25 

0.63 

85.48 

0.22 

88.07 

0.22 

90.94 

0.23 

89.20 

0.23 

86.36 

0.23 

0.42 

91.20 

0.19 

93.13 

0.19 

94.56 

0.20 

93.74 

0.20 

93.08 

0.20 

0.27 

90.56 

0.15 

91.50 

0.15 

94.26 

0.17 

93.58 

0.17 

93.94 

0.17 

250 

1.23 

33.94 

0.16 

35.73 

0.16 

37.67 

0.16 

37.14 

0.16 

33.96 

0.16 

0.91 

94.30 

0.16 

94.60 

0.16 

93.86 

0.16 

93.84 

0.16 

94.38 

0.13 

0.63 

77.50 

0.14 

81.10 

0.14 

83.18 

0.15 

81.04 

0.15 

77.88 

0.15 

0.42 

90.98 

0.12 

92.83 

0.12 

93.64 

0.13 

92.78 

0.13 

91.92 

0.16 

0.27 

90.94 

0.09 

91.70 

0.09 

94.58 

0.11 

93.72 

0.11 

94.04 

0.16 

GCI  -  generalized  confidence  interval;  BCa  -  bias  corrected  and  accelerated;  BP  -  basic  percentile; 
AN  -  asymptotic  normal;  Cov  -  coverage;  Len  -  length 
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Table  B.3:  Coverage  probability  and  length  for  95%  parametrie  CIs  around  6\  under  equal  eosts  and 
three  elasses  with  a  normally  distributed  feature. 


Delta 

GCI 

BCa 

BP 

AN 

BC3 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Normal 

10 

1.23 

91.40 

4.95 

97.20 

2.42 

87.6 

23.4 

93.54 

43.0 

93.08 

1.87 

0-3-1 

0.91 

92.50 

1.04 

97.10 

1.40 

91.24 

2.44 

92.36 

5.52 

93.04 

1.06 

0.63 

92.04 

0.91 

97.30 

1.14 

92.74 

0.97 

92.56 

1.07 

92.24 

0.95 

0.42 

92.20 

0.94 

96.70 

1.10 

93.32 

1.02 

92.74 

1.00 

92.40 

1.00 

0.27 

92.26 

1.03 

96.07 

1.14 

93.32 

1.13 

92.68 

1.09 

92.48 

1.11 

50 

1.23 

93.80 

0.58 

95.73 

0.63 

92.32 

0.58 

92.68 

1.29 

93.88 

0.58 

0.91 

94.56 

0.43 

95.50 

0.46 

93.48 

0.43 

93.84 

0.43 

94.24 

0.43 

0.63 

94.10 

0.39 

94.97 

0.41 

94.14 

0.40 

94.36 

0.40 

94.56 

0.40 

0.42 

94.20 

0.41 

94.83 

0.42 

94.60 

0.42 

94.42 

0.42 

94.52 

0.42 

0.27 

94.78 

0.46 

95.03 

0.46 

94.58 

0.46 

94.40 

0.46 

94.18 

0.46 

100 

1.23 

94.58 

0.41 

95.10 

0.42 

93.56 

0.41 

93.64 

0.40 

94.58 

0.41 

0.91 

94.76 

0.30 

95.37 

0.31 

94.14 

0.30 

94.38 

0.30 

94.60 

0.30 

0.63 

95.08 

0.28 

95.37 

0.28 

94.28 

0.28 

94.46 

0.28 

94.32 

0.28 

0.42 

95.12 

0.29 

95.30 

0.29 

94.40 

0.29 

94.24 

0.29 

94.16 

0.29 

0.27 

94.92 

0.32 

95.27 

0.32 

94.40 

0.32 

94.28 

0.32 

94.16 

0.32 

250 

1.23 

94.60 

0.26 

94.90 

0.26 

94.94 

0.26 

94.92 

0.25 

95.00 

0.26 

0.91 

95.00 

0.19 

95.13 

0.19 

94.94 

0.19 

94.98 

0.19 

94.94 

0.19 

0.63 

94.80 

0.18 

94.97 

0.18 

94.68 

0.18 

94.90 

0.18 

94.68 

0.18 

0.42 

95.20 

0.18 

95.27 

0.18 

94.84 

0.19 

94.82 

0.18 

94.80 

0.19 

0.27 

95.16 

0.20 

95.33 

0.20 

94.70 

0.20 

94.52 

0.20 

94.66 

0.20 

Normal 

10 

1.23 

92.22 

10.0 

97.20 

2.42 

87.58 

23.4 

93.54 

43.0 

93.08 

1.87 

0-3^2 

0.91 

93.28 

1.23 

97.10 

1.40 

91.24 

2.44 

92.36 

5.52 

93.04 

1.06 

0.63 

92.86 

0.91 

97.30 

1.14 

92.74 

0.97 

92.56 

1.07 

92.24 

0.95 

0.42 

92.32 

0.94 

96.70 

1.10 

93.32 

1.02 

92.74 

1.00 

92.40 

1.00 

0.27 

92.22 

1.03 

96.07 

1.14 

93.32 

1.13 

92.68 

1.09 

92.48 

1.11 

50 

1.23 

93.84 

0.58 

95.73 

0.63 

92.32 

0.58 

92.68 

1.29 

93.88 

0.58 

0.91 

94.54 

0.43 

95.50 

0.46 

93.48 

0.43 

93.84 

0.43 

94.24 

0.43 

0.63 

94.10 

0.39 

94.97 

0.41 

94.14 

0.40 

94.36 

0.40 

94.56 

0.40 

0.42 

94.18 

0.41 

94.83 

0.42 

94.60 

0.42 

94.42 

0.42 

94.52 

0.42 

0.27 

94.24 

0.46 

95.03 

0.46 

94.58 

0.46 

94.40 

0.46 

94.18 

0.46 

100 

1.23 

94.66 

0.41 

95.10 

0.42 

93.56 

0.41 

93.64 

0.40 

94.58 

0.41 

0.91 

95.04 

0.30 

95.37 

0.31 

94.14 

0.30 

94.38 

0.30 

94.60 

0.30 

0.63 

95.08 

0.28 

95.37 

0.28 

94.28 

0.28 

94.46 

0.28 

94.32 

0.28 

0.42 

95.14 

0.29 

95.30 

0.29 

94.40 

0.29 

94.24 

0.29 

94.16 

0.29 

0.27 

94.92 

0.32 

95.27 

0.32 

94.40 

0.32 

94.28 

0.32 

94.16 

0.32 

250 

1.23 

94.58 

0.26 

94.90 

0.26 

94.94 

0.26 

94.92 

0.25 

95.00 

0.26 

Continued  on  next  page 
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Table  B.3  -  continued  from  previous  page 


Delta 

GCI 

BCa 

BP 

AN 

BC3 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

0.91 

94.64 

0.19 

95.13 

0.19 

94.94 

0.19 

94.98 

0.19 

94.94 

0.19 

0.63 

94.80 

0.18 

94.97 

0.18 

94.68 

0.18 

94.90 

0.18 

94.68 

0.18 

0.42 

95.20 

0.18 

95.27 

0.18 

94.84 

0.19 

94.82 

0.18 

94.80 

0.19 

0.27 

95.18 

0.20 

95.33 

0.20 

94.70 

0.20 

94.52 

0.20 

94.66 

0.20 

Normal 

10 

1.23 

92.14 

11.3 

97.20 

2.42 

87.58 

23.4 

93.54 

43.0 

93.08 

1.87 

0-3-4 

0.91 

93.28 

1.24 

97.10 

1.40 

91.24 

2.44 

92.36 

5.52 

93.04 

1.06 

0.63 

92.86 

0.91 

97.30 

1.14 

92.74 

0.97 

92.56 

1.07 

92.24 

0.95 

0.42 

92.36 

0.94 

96.70 

1.10 

93.32 

1.02 

92.74 

1.00 

92.40 

1.00 

0.27 

92.22 

1.03 

96.07 

1.14 

93.32 

1.13 

92.68 

1.09 

92.48 

1.11 

50 

1.23 

93.84 

0.58 

95.73 

0.63 

92.32 

0.58 

92.68 

1.29 

93.88 

0.58 

0.91 

94.54 

0.43 

95.50 

0.46 

93.48 

0.43 

93.84 

0.43 

94.24 

0.43 

0.63 

94.10 

0.39 

94.97 

0.41 

94.14 

0.40 

94.36 

0.40 

94.56 

0.40 

0.42 

94.20 

0.41 

94.83 

0.42 

94.60 

0.42 

94.42 

0.42 

94.52 

0.42 

0.27 

94.24 

0.46 

95.03 

0.46 

94.58 

0.46 

94.40 

0.46 

94.18 

0.46 

100 

1.23 

94.68 

0.41 

95.10 

0.42 

93.56 

0.41 

93.64 

0.40 

94.58 

0.41 

0.91 

95.04 

0.30 

95.37 

0.31 

94.14 

0.30 

94.38 

0.30 

94.60 

0.30 

0.63 

95.08 

0.28 

95.37 

0.28 

94.28 

0.28 

94.46 

0.28 

94.32 

0.28 

0.42 

95.12 

0.29 

95.30 

0.29 

94.40 

0.29 

94.24 

0.29 

94.16 

0.29 

0.27 

94.92 

0.32 

95.27 

0.32 

94.40 

0.32 

94.28 

0.32 

94.16 

0.32 

250 

1.23 

94.60 

0.26 

94.90 

0.26 

94.94 

0.26 

94.92 

0.25 

95.00 

0.26 

0.91 

94.64 

0.19 

95.13 

0.19 

94.94 

0.19 

94.98 

0.19 

94.94 

0.19 

0.63 

94.80 

0.18 

94.97 

0.18 

94.68 

0.18 

94.90 

0.18 

94.68 

0.18 

0.42 

95.20 

0.18 

95.27 

0.18 

94.84 

0.19 

94.82 

0.18 

94.80 

0.19 

0.27 

95.18 

0.20 

95.33 

0.20 

94.70 

0.20 

94.52 

0.20 

94.66 

0.20 

GCI  -  generalized  confidence  interval;  BCa  -  bias  corrected  and  accelerated;  BP  -  basic  percentile; 
AN  -  asymptotic  normal;  Cov  -  coverage;  Len  -  length 
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Table  B.4:  Coverage  probability  and  length  for  95%  parametrie  CIs  around  0^  under  equal  eosts  and 
three  elasses  with  a  normally  distributed  feature. 


Delta 

GCI 

BCa 

BP 

AN 

BC3 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Normal 

10 

1.23 

91.52 

1.85 

97.00 

2.46 

87.92 

43.7 

93.38 

20.7 

92.56 

1.90 

0-3-1 

0.91 

92.42 

1.09 

96.73 

1.40 

91.10 

1.28 

92.24 

3.16 

92.56 

1.06 

0.63 

92.32 

0.91 

96.33 

1.14 

92.76 

0.97 

92.74 

1.11 

92.42 

0.95 

0.42 

92.36 

0.94 

95.80 

1.09 

93.56 

1.02 

92.96 

0.99 

92.48 

1.01 

0.27 

92.48 

1.03 

95.53 

1.13 

94.00 

1.13 

92.92 

1.09 

92.34 

1.11 

50 

1.23 

93.96 

0.58 

95.97 

0.64 

92.34 

0.58 

92.74 

0.62 

94.28 

0.58 

0.91 

94.60 

0.43 

95.57 

0.46 

93.56 

0.43 

93.92 

0.43 

94.58 

0.43 

0.63 

94.74 

0.39 

95.23 

0.41 

94.72 

0.40 

94.78 

0.40 

94.68 

0.40 

0.42 

94.58 

0.41 

94.83 

0.42 

94.72 

0.42 

94.66 

0.42 

94.42 

0.42 

0.27 

94.16 

0.46 

94.47 

0.46 

94.62 

0.46 

94.36 

0.46 

94.20 

0.46 

100 

1.23 

94.46 

0.41 

95.13 

0.42 

93.22 

0.41 

93.48 

0.40 

94.26 

0.40 

0.91 

94.80 

0.30 

95.70 

0.31 

94.10 

0.30 

94.40 

0.30 

94.58 

0.30 

0.63 

94.94 

0.28 

95.03 

0.28 

94.42 

0.28 

94.54 

0.28 

94.56 

0.28 

0.42 

95.00 

0.29 

94.93 

0.29 

94.68 

0.29 

94.84 

0.29 

94.76 

0.29 

0.27 

95.18 

0.32 

95.00 

0.32 

95.14 

0.32 

95.10 

0.32 

95.08 

0.32 

250 

1.23 

95.40 

0.26 

95.20 

0.26 

94.34 

0.26 

94.66 

0.26 

94.84 

0.26 

0.91 

94.90 

0.19 

95.27 

0.19 

94.70 

0.19 

94.86 

0.19 

94.92 

0.19 

0.63 

95.20 

0.18 

95.20 

0.18 

95.08 

0.18 

95.20 

0.18 

95.38 

0.18 

0.42 

95.12 

0.18 

95.40 

0.18 

95.20 

0.19 

95.20 

0.18 

95.22 

0.18 

0.27 

94.88 

0.20 

95.33 

0.20 

94.92 

0.20 

94.96 

0.20 

94.96 

0.20 

Normal 

10 

1.23 

91.54 

17.0 

96.43 

2.88 

89.24 

179 

92.20 

80.3 

88.28 

2.26 

0-3^2 

0.91 

92.26 

9.82 

96.67 

1.76 

90.24 

1.85 

91.36 

5.81 

89.02 

1.37 

0.63 

92.64 

3.53 

96.27 

1.49 

91.74 

1.31 

91.58 

1.89 

90.34 

1.27 

0.42 

92.24 

2.18 

95.80 

1.47 

92.58 

1.38 

92.20 

1.34 

91.14 

1.35 

0.27 

92.14 

1.41 

95.23 

1.55 

93.42 

1.54 

92.28 

1.48 

92.06 

1.50 

50 

1.23 

94.22 

0.66 

95.27 

0.69 

92.84 

0.67 

93.38 

0.67 

92.78 

0.66 

0.91 

94.28 

0.57 

95.20 

0.59 

93.16 

0.58 

93.04 

0.57 

92.98 

0.57 

0.63 

94.18 

0.55 

94.53 

0.56 

93.86 

0.55 

93.82 

0.54 

93.52 

0.55 

0.42 

94.48 

0.57 

94.53 

0.58 

94.14 

0.57 

93.92 

0.57 

93.68 

0.57 

0.27 

94.62 

0.62 

94.60 

0.63 

93.82 

0.63 

93.86 

0.62 

93.58 

0.62 

100 

1.23 

95.06 

0.47 

94.97 

0.47 

93.76 

0.47 

93.60 

0.46 

93.24 

0.46 

0.91 

95.16 

0.41 

95.07 

0.41 

93.90 

0.41 

94.00 

0.40 

93.44 

0.40 

0.63 

95.20 

0.39 

94.90 

0.39 

94.58 

0.39 

94.24 

0.39 

94.14 

0.39 

0.42 

95.26 

0.40 

94.73 

0.41 

94.80 

0.40 

94.64 

0.40 

94.72 

0.40 

0.27 

95.34 

0.44 

94.93 

0.44 

95.26 

0.44 

95.10 

0.44 

95.14 

0.44 

250 

1.23 

94.82 

0.29 

95.50 

0.30 

94.88 

0.30 

94.88 

0.29 

94.98 

0.29 

Continued  on  next  page 
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Table  B.4  -  continued  from  previous  page 


Delta 

GCI 

BCa 

BP 

AN 

BC3 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

0.91 

94.86 

0.26 

95.33 

0.26 

95.24 

0.26 

95.16 

0.26 

95.20 

0.26 

0.63 

95.18 

0.24 

94.83 

0.25 

95.22 

0.25 

95.14 

0.24 

95.20 

0.25 

0.42 

95.24 

0.25 

95.07 

0.25 

95.50 

0.25 

95.28 

0.25 

95.08 

0.25 

0.27 

95.08 

0.28 

94.73 

0.28 

95.20 

0.28 

95.04 

0.28 

94.96 

0.28 

Normal 

10 

1.23 

91.10 

24.7 

96.10 

2.35 

89.30 

14.7 

90.66 

11.4 

84.70 

1.95 

0-3-4 

0.91 

91.46 

6.41 

95.97 

1.92 

89.00 

2.14 

89.66 

2.58 

84.64 

1.55 

0.63 

92.06 

1.59 

95.93 

1.84 

90.30 

1.65 

90.16 

1.69 

87.00 

1.54 

0.42 

92.14 

1.65 

96.10 

1.89 

91.66 

1.75 

91.04 

1.67 

88.94 

1.66 

0.27 

92.24 

1.79 

95.60 

2.02 

92.86 

1.95 

91.96 

1.84 

90.04 

1.85 

50 

1.23 

94.32 

0.75 

95.23 

0.78 

92.96 

0.76 

92.90 

0.74 

92.14 

0.74 

0.91 

93.80 

0.71 

95.13 

0.73 

92.86 

0.71 

93.00 

0.69 

91.84 

0.70 

0.63 

94.18 

0.71 

94.97 

0.72 

93.12 

0.70 

93.46 

0.69 

92.42 

0.69 

0.42 

94.20 

0.73 

95.03 

0.75 

93.30 

0.73 

93.28 

0.72 

92.84 

0.72 

0.27 

94.34 

0.78 

95.07 

0.80 

93.48 

0.79 

93.50 

0.77 

93.00 

0.78 

100 

1.23 

94.60 

0.53 

94.60 

0.54 

94.08 

0.54 

93.38 

0.53 

92.56 

0.53 

0.91 

95.00 

0.50 

94.93 

0.51 

94.00 

0.51 

93.56 

0.50 

92.98 

0.50 

0.63 

95.28 

0.50 

95.00 

0.50 

94.18 

0.50 

93.92 

0.49 

94.00 

0.50 

0.42 

95.32 

0.52 

94.93 

0.52 

94.30 

0.52 

94.64 

0.51 

94.22 

0.51 

0.27 

95.32 

0.55 

94.87 

0.56 

94.86 

0.56 

94.94 

0.55 

94.78 

0.55 

250 

1.23 

95.06 

0.34 

95.43 

0.34 

94.96 

0.34 

94.82 

0.34 

94.68 

0.34 

0.91 

94.82 

0.32 

95.83 

0.32 

94.94 

0.32 

94.88 

0.32 

94.66 

0.32 

0.63 

95.16 

0.32 

95.00 

0.32 

94.66 

0.32 

94.76 

0.31 

94.78 

0.32 

0.42 

95.22 

0.33 

94.90 

0.33 

94.82 

0.33 

94.80 

0.32 

94.56 

0.33 

0.27 

95.10 

0.35 

94.60 

0.35 

94.58 

0.35 

94.88 

0.35 

94.60 

0.35 

GCI  -  generalized  confidence  interval;  BCa  -  bias  corrected  and  accelerated;  BP  -  basic  percentile; 
AN  -  asymptotic  normal;  Cov  -  coverage;  Len  -  length 
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Table  B.5:  Coverage  probability  and  length  for  95%  parametrie  CIs  around  under  equal  eosts  and 
three  elasses  with  a  non-normally  distributed  feature. 


Delta  GCI  BCa  BP  AN 


BC3 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Gamma 

10 

1.23 

54.30 

1.67 

65.33 

2.76 

55.14 

1.77 

53.56 

1.80 

83.40 

1.78 

0.91 

55.28 

4.66 

65.33 

2.76 

55.14 

1.77 

53.56 

1.80 

83.40 

1.78 

0.63 

55.44 

1.76 

64.50 

2.84 

55.26 

1.81 

53.84 

1.84 

83.58 

1.81 

0.42 

72.24 

1.87 

74.67 

2.18 

81.78 

2.48 

80.20 

2.45 

85.88 

2.38 

0.27 

72.18 

1.87 

74.67 

2.18 

81.78 

2.48 

80.20 

2.45 

85.88 

2.38 

50 

1.23 

1.58 

0.71 

2.37 

0.75 

1.94 

0.94 

1.64 

0.91 

6.88 

0.92 

0.91 

1.58 

0.71 

2.37 

0.75 

1.94 

0.94 

1.64 

0.91 

6.88 

0.92 

0.63 

1.58 

0.71 

1.97 

0.74 

2.08 

0.93 

1.80 

0.91 

7.02 

0.91 

0.42 

37.30 

0.87 

33.10 

0.89 

56.52 

1.43 

62.30 

1.37 

74.84 

1.37 

0.27 

37.28 

0.87 

33.10 

0.89 

56.52 

1.43 

62.30 

1.37 

74.84 

1.37 

100 

1.23 

0.00 

0.49 

0.07 

0.50 

0.28 

0.67 

0.06 

0.64 

0.48 

0.64 

0.91 

0.00 

0.49 

0.07 

0.50 

0.28 

0.67 

0.06 

0.64 

0.48 

0.64 

0.63 

0.08 

0.49 

0.03 

0.50 

0.18 

0.67 

0.08 

0.64 

0.48 

0.64 

0.42 

15.88 

0.62 

14.53 

0.62 

30.18 

1.06 

37.08 

1.02 

45.58 

1.02 

0.27 

15.92 

0.62 

14.53 

0.62 

30.18 

1.06 

37.08 

1.02 

45.58 

1.02 

250 

1.23 

0.00 

0.31 

0.00 

0.31 

0.00 

0.42 

0.00 

0.41 

0.00 

0.41 

0.91 

0.00 

0.31 

0.00 

0.31 

0.00 

0.42 

0.00 

0.41 

0.00 

0.41 

0.63 

0.00 

0.31 

0.00 

0.31 

0.00 

0.42 

0.00 

0.41 

0.00 

0.41 

0.42 

1.00 

0.39 

0.73 

0.39 

4.16 

0.70 

6.38 

0.68 

8.10 

0.68 

0.27 

1.10 

0.39 

0.73 

0.39 

4.16 

0.70 

6.38 

0.68 

8.10 

0.68 

Gamma  w/ 

10 

1.23 

89.02 

2.11 

95.10 

63.7 

88.90 

1.62 

87.68 

1.65 

93.50 

1.64 

Box-Cox 

0.91 

84.24 

3.13 

94.77 

122 

84.64 

1.52 

82.14 

1.63 

91.56 

1.62 

0.63 

78.60 

242 

91.47 

567 

78.18 

1.43 

74.60 

1.64 

89.60 

1.63 

0.42 

91.12 

2.19 

95.93 

2.59 

91.64 

2.39 

88.96 

2.38 

91.40 

2.38 

0.27 

89.10 

2.17 

94.57 

2.55 

89.84 

2.35 

86.00 

2.42 

90.74 

2.41 

50 

1.23 

94.06 

0.73 

94.93 

0.79 

93.94 

0.75 

92.88 

0.74 

94.18 

0.74 

0.91 

73.76 

0.68 

75.60 

0.73 

71.58 

0.69 

65.70 

0.69 

75.70 

0.70 

0.63 

37.12 

0.60 

40.90 

0.66 

33.72 

0.63 

28.64 

0.65 

43.46 

0.65 

0.42 

86.80 

0.94 

87.67 

0.97 

88.82 

1.00 

85.58 

1.00 

89.24 

1.00 

0.27 

73.68 

0.92 

74.30 

0.95 

74.98 

0.99 

70.62 

0.99 

79.46 

1.00 

100 

1.23 

94.12 

0.51 

94.97 

0.53 

94.26 

0.52 

93.74 

0.51 

94.38 

0.52 

0.91 

53.18 

0.47 

57.20 

0.49 

52.32 

0.48 

46.16 

0.47 

55.18 

0.48 

0.63 

10.20 

0.43 

10.13 

0.44 

10.04 

0.43 

7.44 

0.44 

13.26 

0.44 

0.42 

79.14 

0.66 

79.27 

0.67 

81.88 

0.71 

79.00 

0.71 

82.44 

0.71 

0.27 

53.36 

0.65 

51.47 

0.66 

56.88 

0.70 

53.04 

0.70 

60.14 

0.70 

250 

1.23 

93.32 

0.32 

93.67 

0.33 

93.88 

0.32 

92.80 

0.32 

93.76 

0.32 

Continued  on  next  page 


164 


Table  B.5  -  continued  from  previous  page 


Delta 

GCI 

BCa 

BP 

AN 

BC^ 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

0.91 

15.38 

0.30 

16.27 

0.30 

15.40 

0.30 

12.74 

0.29 

16.22 

0.30 

0.63 

0.10 

0.27 

0.17 

0.27 

0.20 

0.26 

0.10 

0.26 

0.22 

0.27 

0.42 

57.02 

0.42 

57.07 

0.42 

61.22 

0.45 

58.58 

0.45 

61.94 

0.45 

0.27 

15.76 

0.41 

15.10 

0.41 

18.26 

0.44 

16.74 

0.44 

19.82 

0.44 

Normal 

10 

1.23 

87.10 

1.28 

95.37 

6.03 

91.80 

1.46 

89.46 

1.48 

87.48 

1.42 

Mixture 

0.91 

87.02 

1.28 

95.97 

1.85 

91.80 

1.46 

89.46 

1.48 

87.48 

1.42 

0.63 

87.02 

1.28 

94.13 

1.47 

91.80 

1.46 

89.46 

1.48 

87.48 

1.42 

0.42 

90.96 

1.32 

94.13 

1.47 

89.00 

1.99 

90.50 

5.14 

88.74 

1.35 

0.27 

89.56 

90.2 

94.13 

1.47 

85.88 

159 

94.10 

381 

90.50 

5.27 

50 

1.23 

76.34 

0.56 

90.40 

0.97 

78.20 

0.56 

76.34 

0.56 

74.38 

0.56 

0.91 

76.34 

0.56 

93.33 

0.59 

78.20 

0.56 

76.34 

0.56 

74.36 

0.56 

0.63 

76.34 

0.56 

80.93 

0.57 

78.20 

0.56 

76.34 

0.56 

74.36 

0.56 

0.42 

91.10 

0.57 

80.93 

0.57 

90.80 

0.57 

90.04 

0.55 

89.08 

0.56 

0.27 

92.30 

0.84 

80.93 

0.57 

90.64 

1.06 

93.20 

4.68 

93.42 

0.99 

100 

1.23 

57.86 

0.39 

87.13 

0.59 

60.10 

0.39 

58.00 

0.39 

55.64 

0.39 

0.91 

57.86 

0.39 

90.67 

0.41 

60.10 

0.39 

58.00 

0.39 

55.64 

0.39 

0.63 

57.86 

0.39 

61.93 

0.40 

60.10 

0.39 

58.00 

0.39 

55.64 

0.39 

0.42 

89.64 

0.40 

61.93 

0.40 

89.34 

0.40 

87.90 

0.40 

87.26 

0.40 

0.27 

88.00 

0.57 

61.93 

0.40 

87.80 

0.61 

88.90 

0.63 

90.76 

0.61 

250 

1.23 

20.26 

0.25 

75.53 

0.36 

21.92 

0.25 

20.32 

0.24 

19.66 

0.25 

0.91 

20.26 

0.25 

84.27 

0.26 

21.92 

0.25 

20.32 

0.24 

19.66 

0.25 

0.63 

20.26 

0.25 

21.77 

0.25 

21.92 

0.25 

20.32 

0.24 

19.66 

0.25 

0.42 

83.08 

0.25 

21.77 

0.25 

84.82 

0.25 

82.66 

0.25 

82.12 

0.25 

0.27 

77.98 

0.35 

21.77 

0.25 

78.30 

0.38 

78.96 

0.37 

82.56 

0.38 

GCI  -  generalized  confidence  interval;  BCa  -  bias  corrected  and  accelerated;  BP  -  basic  percentile; 
AN  -  asymptotic  normal;  Cov  -  coverage;  Len  -  length 
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Table  B.6:  Coverage  probability  and  length  for  95%  parametrie  CIs  around  0^  under  equal  eosts  and 
three  elasses  with  a  non-normally  distributed  feature. 


Delta  GCI  BCa  BP  AN 


BCa 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Gamma 

10 

1.23 

66.20 

3.67 

78.17 

6.95 

58.90 

3.03 

58.22 

3.17 

85.56 

3.17 

0.91 

71.68 

2.92 

75.93 

3.44 

76.18 

3.62 

75.60 

3.58 

87.80 

3.52 

0.63 

70.54 

4.49 

82.67 

5.10 

79.32 

5.82 

76.02 

5.60 

73.00 

5.48 

0.42 

79.64 

7.73 

82.70 

8.92 

84.58 

9.66 

83.42 

9.47 

88.34 

9.37 

0.27 

74.26 

11.7 

83.27 

13.3 

81.34 

15.0 

78.50 

14.4 

75.52 

14.1 

50 

1.23 

11.74 

1.47 

15.10 

1.60 

14.22 

1.70 

11.50 

1.71 

26.92 

1.73 

0.91 

16.02 

1.33 

15.03 

1.36 

24.64 

1.92 

28.68 

1.86 

41.76 

1.86 

0.63 

61.82 

2.00 

65.10 

2.04 

77.92 

2.91 

74.80 

2.83 

71.58 

2.81 

0.42 

46.80 

3.52 

43.23 

3.59 

59.20 

4.82 

62.96 

4.72 

74.24 

4.73 

0.27 

66.76 

5.20 

72.20 

5.32 

82.02 

7.40 

78.60 

7.20 

75.80 

7.17 

100 

1.23 

1.64 

1.02 

2.13 

1.06 

4.24 

1.18 

2.30 

1.18 

6.82 

1.19 

0.91 

2.00 

0.95 

2.07 

0.96 

4.94 

1.41 

6.54 

1.37 

9.88 

1.37 

0.63 

49.78 

1.41 

54.30 

1.43 

75.16 

2.16 

70.12 

2.10 

67.40 

2.10 

0.42 

21.84 

2.49 

20.50 

2.52 

33.64 

3.52 

38.20 

3.46 

47.04 

3.46 

0.27 

57.92 

3.68 

61.13 

3.71 

79.58 

5.44 

75.06 

5.29 

72.74 

5.29 

250 

1.23 

0.00 

0.63 

0.00 

0.64 

0.08 

0.73 

0.02 

0.72 

0.10 

0.72 

0.91 

0.00 

0.60 

0.00 

0.60 

0.02 

0.91 

0.04 

0.90 

0.04 

0.90 

0.63 

27.62 

0.90 

30.00 

0.90 

57.86 

1.41 

51.14 

1.37 

49.56 

1.38 

0.42 

1.94 

1.58 

1.77 

1.59 

4.96 

2.29 

6.00 

2.26 

7.58 

2.26 

0.27 

39.44 

2.33 

42.33 

2.34 

67.82 

3.55 

61.84 

3.48 

59.84 

3.49 

Gamma 

10 

1.23 

89.58 

3.65 

92.60 

999 

84.64 

2.85 

85.46 

3.03 

92.92 

3.02 

w/  Box-Cox 

0.91 

88.90 

3.06 

95.07 

3.89 

90.44 

3.33 

87.54 

3.35 

91.80 

3.35 

0.63 

90.02 

6.92 

93.30 

7.14 

90.84 

7.41 

92.56 

7.63 

87.66 

7.63 

0.42 

91.32 

8.27 

95.23 

9.61 

92.16 

9.02 

90.68 

9.03 

91.94 

9.07 

0.27 

90.92 

17.3 

94.77 

18.0 

92.32 

18.6 

92.32 

18.7 

90.56 

18.8 

50 

1.23 

94.76 

1.42 

95.13 

1.56 

93.42 

1.42 

92.98 

1.41 

94.64 

1.42 

0.91 

78.50 

1.31 

79.50 

1.36 

81.14 

1.42 

78.84 

1.42 

81.94 

1.42 

0.63 

88.38 

3.00 

90.40 

3.00 

89.94 

3.07 

91.30 

3.09 

87.54 

3.10 

0.42 

90.76 

3.63 

91.93 

3.70 

90.72 

3.79 

89.84 

3.78 

91.94 

3.80 

0.27 

92.00 

7.50 

93.00 

7.51 

91.70 

7.55 

93.08 

7.58 

90.82 

7.61 

100 

1.23 

94.06 

1.00 

95.03 

1.05 

93.20 

0.99 

92.56 

0.99 

93.98 

0.99 

0.91 

64.18 

0.93 

66.17 

0.94 

68.18 

1.01 

66.26 

1.00 

68.76 

1.01 

0.63 

84.88 

2.11 

86.93 

2.11 

87.02 

2.17 

88.42 

2.18 

84.46 

2.18 

0.42 

86.44 

2.56 

86.23 

2.58 

87.32 

2.67 

86.06 

2.66 

88.54 

2.68 

0.27 

89.60 

5.28 

90.43 

5.29 

90.26 

5.37 

91.26 

5.37 

89.04 

5.39 

250 

1.23 

93.70 

0.63 

94.20 

0.64 

93.24 

0.62 

92.90 

0.62 

93.38 

0.62 

Continued  on  next  page 
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Table  B.6  -  continued  from  previous  page 


Delta 

GCI 

BCa 

BP 

AN 

BCa 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

0.91 

31.34 

0.58 

34.57 

0.59 

35.56 

0.63 

34.16 

0.63 

35.74 

0.63 

0.63 

75.20 

1.33 

75.07 

1.33 

77.96 

1.38 

79.12 

1.38 

75.46 

1.38 

0.42 

74.76 

1.62 

74.57 

1.62 

76.52 

1.69 

75.28 

1.68 

77.96 

1.69 

0.27 

84.10 

3.34 

83.50 

3.34 

85.40 

3.41 

86.48 

3.40 

83.86 

3.42 

Normal 

10 

1.23 

92.32 

1.22 

96.77 

1.55 

92.10 

1.36 

92.58 

2.47 

90.04 

1.26 

Mixture 

0.91 

92.22 

1.22 

96.77 

1.55 

92.10 

1.36 

92.58 

2.47 

90.04 

1.26 

0.63 

92.16 

1.13 

95.97 

1.40 

92.22 

1.21 

91.94 

1.54 

91.70 

1.18 

0.42 

92.52 

1.23 

94.70 

1.37 

92.30 

1.35 

92.68 

1.30 

91.94 

1.32 

0.27 

91.78 

2.08 

95.20 

2.01 

92.22 

2.07 

90.00 

1.98 

90.46 

2.03 

50 

1.23 

95.68 

0.52 

95.07 

0.54 

94.52 

0.52 

94.66 

0.51 

94.12 

0.51 

0.91 

95.68 

0.52 

95.07 

0.54 

94.52 

0.52 

94.66 

0.51 

94.12 

0.51 

0.63 

92.16 

0.49 

90.97 

0.51 

90.32 

0.49 

90.58 

0.49 

92.18 

0.49 

0.42 

88.16 

0.55 

88.50 

0.55 

87.24 

0.55 

87.88 

0.54 

89.44 

0.54 

0.27 

91.52 

0.86 

91.73 

0.87 

90.26 

0.82 

88.88 

0.81 

88.44 

0.82 

100 

1.23 

95.54 

0.37 

95.70 

0.37 

94.88 

0.36 

94.78 

0.36 

94.88 

0.36 

0.91 

95.52 

0.37 

95.70 

0.37 

94.88 

0.36 

94.78 

0.36 

94.88 

0.36 

0.63 

87.54 

0.35 

86.80 

0.35 

85.44 

0.35 

86.06 

0.34 

88.14 

0.35 

0.42 

82.02 

0.38 

79.90 

0.39 

78.58 

0.38 

79.72 

0.38 

81.76 

0.38 

0.27 

86.82 

0.61 

88.23 

0.61 

85.70 

0.58 

84.46 

0.57 

83.88 

0.58 

250 

1.23 

95.40 

0.23 

95.73 

0.23 

94.96 

0.23 

94.80 

0.23 

95.00 

0.23 

0.91 

95.40 

0.23 

95.73 

0.23 

94.96 

0.23 

94.80 

0.23 

95.00 

0.23 

0.63 

74.44 

0.22 

72.87 

0.22 

72.24 

0.22 

72.94 

0.22 

75.40 

0.22 

0.42 

59.00 

0.24 

58.47 

0.24 

56.84 

0.24 

58.06 

0.24 

60.10 

0.24 

0.27 

74.10 

0.38 

76.23 

0.38 

72.88 

0.37 

71.42 

0.36 

70.72 

0.37 

GCI  -  generalized  confidence  interval;  BCa  -  bias  corrected  and  accelerated;  BP  -  basic  percentile; 
AN  -  asymptotic  normal;  Cov  -  coverage;  Len  -  length 
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Table  B.7:  Coverage  probability  and  length  for  95%  parametrie  CIs  around  BC  under  unequal  eosts 
and  and  three  elasses  with  a  normally  distributed  feature. 


Delta  GCI  BCa  BP  AN 


BCa 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cost\ 

10 

0.45 

91.10 

0.31 

96.73 

0.31 

91.78 

0.30 

89.84 

0.30 

87.54 

0.30 

0.31 

93.04 

0.27 

95.60 

0.28 

91.48 

0.26 

90.66 

0.25 

86.88 

0.25 

0.21 

92.18 

0.23 

96.27 

0.24 

90.90 

0.22 

88.66 

0.21 

86.26 

0.20 

0.14 

90.84 

0.19 

95.37 

0.21 

90.40 

0.19 

86.10 

0.17 

85.12 

0.16 

0.09 

89.94 

0.16 

94.87 

0.18 

89.94 

0.16 

83.90 

0.13 

85.26 

0.12 

50 

0.45 

94.50 

0.14 

95.37 

0.14 

95.14 

0.14 

94.46 

0.14 

93.54 

0.14 

0.31 

94.68 

0.12 

94.97 

0.12 

94.42 

0.12 

94.72 

0.12 

93.72 

0.12 

0.21 

94.64 

0.10 

94.90 

0.10 

94.30 

0.10 

94.08 

0.10 

93.16 

0.10 

0.14 

94.08 

0.09 

94.80 

0.09 

94.14 

0.09 

93.34 

0.08 

93.24 

0.08 

0.09 

93.56 

0.07 

95.07 

0.07 

94.00 

0.07 

92.64 

0.07 

93.20 

0.07 

100 

0.45 

95.16 

0.10 

95.07 

0.10 

94.60 

0.10 

94.20 

0.10 

94.06 

0.10 

0.31 

95.20 

0.08 

94.37 

0.08 

94.42 

0.09 

94.54 

0.08 

93.94 

0.08 

0.21 

95.00 

0.07 

94.80 

0.07 

94.38 

0.07 

94.12 

0.07 

93.94 

0.07 

0.14 

94.74 

0.06 

95.07 

0.06 

94.14 

0.06 

93.72 

0.06 

93.90 

0.06 

0.09 

94.70 

0.05 

95.10 

0.05 

94.14 

0.05 

93.80 

0.05 

93.60 

0.05 

250 

0.45 

94.92 

0.06 

94.83 

0.06 

95.12 

0.06 

95.06 

0.06 

94.96 

0.06 

0.31 

94.74 

0.05 

94.43 

0.05 

94.80 

0.05 

94.64 

0.05 

94.84 

0.05 

0.21 

94.78 

0.05 

94.43 

0.05 

94.82 

0.05 

94.32 

0.05 

94.52 

0.05 

0.14 

94.78 

0.04 

95.27 

0.04 

94.54 

0.04 

94.28 

0.04 

94.60 

0.04 

0.09 

94.68 

0.03 

94.97 

0.03 

94.36 

0.03 

94.24 

0.03 

94.52 

0.03 

Co  St 2 

10 

0.89 

91.46 

0.63 

95.13 

0.61 

91.26 

0.59 

89.16 

0.61 

85.76 

0.60 

0.66 

91.90 

0.60 

96.63 

0.60 

92.30 

0.58 

88.90 

0.56 

85.24 

0.56 

0.46 

91.04 

0.53 

96.50 

0.55 

90.80 

0.54 

87.50 

0.48 

83.92 

0.48 

0.31 

90.02 

0.45 

96.40 

0.48 

90.46 

0.47 

85.32 

0.39 

83.54 

0.38 

0.20 

89.00 

0.36 

95.97 

0.41 

89.54 

0.39 

83.02 

0.30 

83.46 

0.29 

50 

0.89 

94.18 

0.29 

94.77 

0.29 

94.02 

0.29 

93.74 

0.29 

93.24 

0.29 

0.66 

94.28 

0.27 

96.00 

0.27 

94.14 

0.27 

93.82 

0.27 

93.16 

0.27 

0.46 

94.18 

0.24 

95.57 

0.24 

94.02 

0.24 

93.20 

0.24 

92.94 

0.24 

0.31 

93.84 

0.21 

95.17 

0.21 

93.84 

0.21 

92.76 

0.20 

92.82 

0.20 

0.20 

93.28 

0.17 

94.70 

0.17 

93.58 

0.17 

92.30 

0.16 

92.76 

0.16 

100 

0.89 

94.86 

0.21 

94.47 

0.21 

93.94 

0.21 

93.90 

0.21 

93.66 

0.21 

0.66 

94.78 

0.19 

95.10 

0.19 

94.20 

0.19 

93.98 

0.19 

93.92 

0.19 

0.46 

94.60 

0.17 

94.70 

0.17 

94.20 

0.17 

93.82 

0.17 

93.62 

0.17 

0.31 

94.26 

0.15 

95.00 

0.15 

94.54 

0.15 

93.66 

0.14 

93.58 

0.14 

0.20 

93.88 

0.12 

95.00 

0.12 

94.32 

0.12 

93.54 

0.11 

93.74 

0.11 

250 

0.89 

94.62 

0.13 

94.93 

0.13 

95.02 

0.13 

94.92 

0.13 

94.56 

0.13 
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Table  B.7  -  continued  from  previous  page 


Uj  BC3 

Delta 

GCI 

BCa 

BP 

AN 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

0.66 

94.68 

0.12 

95.00 

0.12 

94.80 

0.12 

94.74 

0.12 

94.36 

0.12 

0.46 

94.58 

0.11 

95.13 

0.11 

94.68 

0.11 

94.56 

0.11 

94.36 

0.11 

0.31 

94.74 

0.09 

94.77 

0.09 

94.56 

0.09 

94.54 

0.09 

94.50 

0.09 

0.20 

94.70 

0.07 

95.00 

0.07 

94.70 

0.08 

94.48 

0.07 

94.56 

0.07 

GCI  -  generalized  confidence  interval;  BCa  -  bias  corrected  and  accelerated;  BP  -  basic  percentile; 
AN  -  asymptotic  normal;  Cov  -  coverage;  Len  -  length 
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Table  B.8:  Coverage  probability  and  length  for  95%  parametric  CIs  around  6\  under  unequal  costs 
and  three  classes  with  a  normally  distributed  feature. 


Delta  GCI  BCa  BP  AN 


BCa 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cost\ 

10 

0.45 

91.28 

52.0 

97.00 

4.33 

90.08 

6.75 

94.74 

4.78 

92.68 

2.38 

0.31 

93.42 

0.93 

96.73 

1.44 

92.24 

1.23 

93.82 

1.22 

93.84 

0.99 

0.21 

93.04 

0.89 

97.13 

1.10 

93.00 

0.95 

93.62 

0.95 

92.52 

0.93 

0.14 

92.30 

0.93 

95.43 

1.07 

93.40 

1.01 

92.92 

0.99 

92.48 

1.00 

0.09 

92.26 

1.02 

96.07 

1.13 

93.34 

1.12 

92.80 

1.09 

92.52 

1.10 

50 

0.45 

94.04 

0.49 

95.73 

0.54 

93.36 

0.52 

93.30 

0.53 

93.22 

0.50 

0.31 

95.02 

0.40 

95.50 

0.41 

93.80 

0.40 

93.90 

0.39 

94.22 

0.40 

0.21 

94.38 

0.39 

94.90 

0.40 

94.14 

0.39 

94.24 

0.39 

94.44 

0.39 

0.14 

94.10 

0.41 

94.57 

0.42 

94.50 

0.42 

94.38 

0.41 

94.60 

0.42 

0.09 

94.22 

0.46 

95.23 

0.46 

94.62 

0.46 

94.38 

0.46 

94.18 

0.46 

100 

0.45 

94.80 

0.35 

94.90 

0.36 

94.16 

0.35 

94.16 

0.34 

94.42 

0.35 

0.31 

94.98 

0.28 

95.10 

0.29 

94.82 

0.28 

94.82 

0.28 

94.96 

0.28 

0.21 

95.02 

0.27 

95.20 

0.28 

94.28 

0.27 

94.36 

0.27 

94.32 

0.27 

0.14 

95.14 

0.29 

95.57 

0.29 

94.40 

0.29 

94.24 

0.29 

94.16 

0.29 

0.09 

94.92 

0.32 

94.87 

0.32 

94.38 

0.32 

94.28 

0.32 

94.14 

0.32 

250 

0.45 

94.72 

0.22 

94.47 

0.22 

94.76 

0.22 

94.68 

0.22 

94.84 

0.22 

0.31 

94.64 

0.18 

95.83 

0.18 

94.78 

0.18 

94.96 

0.18 

94.86 

0.18 

0.21 

94.62 

0.17 

95.30 

0.17 

94.72 

0.17 

94.78 

0.17 

94.74 

0.17 

0.14 

95.18 

0.18 

94.43 

0.18 

94.88 

0.18 

94.88 

0.18 

94.78 

0.18 

0.09 

95.14 

0.20 

95.17 

0.20 

94.70 

0.20 

94.54 

0.20 

94.66 

0.20 

Co  St 2 

10 

0.89 

87.90 

446 

96.43 

8.79 

88.80 

11.5 

87.44 

8.16 

89.38 

5.23 

0.66 

92.22 

49.5 

96.97 

5.26 

89.94 

4.94 

93.26 

3.56 

89.48 

2.89 

0.46 

93.56 

7.47 

96.37 

2.96 

90.94 

2.52 

93.96 

1.94 

90.28 

1.64 

0.31 

93.04 

3.13 

97.10 

1.81 

92.32 

1.46 

93.74 

1.30 

90.84 

1.16 

0.20 

92.26 

1.09 

96.23 

1.36 

92.84 

1.22 

92.84 

1.18 

91.56 

1.14 

50 

0.89 

93.78 

15.9 

95.60 

2.63 

94.18 

2.97 

94.70 

2.69 

93.30 

2.27 

0.66 

94.02 

0.52 

95.23 

0.58 

93.36 

0.61 

94.40 

0.60 

93.00 

0.55 

0.46 

93.96 

0.45 

94.73 

0.47 

93.24 

0.46 

93.42 

0.45 

93.34 

0.45 

0.31 

94.34 

0.45 

94.83 

0.46 

93.42 

0.45 

93.46 

0.45 

93.36 

0.45 

0.20 

94.12 

0.48 

95.07 

0.49 

93.60 

0.48 

93.42 

0.48 

93.56 

0.48 

100 

0.89 

95.08 

0.59 

94.83 

1.10 

94.60 

1.23 

96.02 

1.28 

94.16 

1.03 

0.66 

94.88 

0.37 

95.03 

0.38 

93.84 

0.37 

94.74 

0.37 

94.18 

0.37 

0.46 

94.78 

0.32 

94.60 

0.33 

94.26 

0.32 

94.66 

0.32 

94.54 

0.32 

0.31 

95.04 

0.32 

95.40 

0.32 

94.22 

0.32 

94.42 

0.32 

94.34 

0.32 

0.20 

95.18 

0.34 

95.53 

0.34 

94.00 

0.34 

94.06 

0.34 

94.04 

0.34 

250 

0.89 

94.98 

0.34 

94.93 

0.36 

95.22 

0.37 

96.24 

0.38 

95.14 

0.36 

Continued  on  next  page 
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Table  B.8  -  continued  from  previous  page 


Uj  BC3 

Delta 

GCI 

BCa 

BP 

AN 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

0.66 

94.86 

0.23 

94.43 

0.23 

94.84 

0.23 

95.38 

0.23 

95.08 

0.23 

0.46 

95.04 

0.20 

94.90 

0.20 

94.64 

0.20 

94.50 

0.20 

94.34 

0.20 

0.31 

95.38 

0.20 

94.93 

0.20 

94.40 

0.20 

94.42 

0.20 

94.32 

0.20 

0.20 

95.54 

0.22 

94.70 

0.22 

94.36 

0.22 

94.32 

0.21 

94.34 

0.22 

GCI  -  generalized  confidence  interval;  BCa  -  bias  corrected  and  accelerated;  BP  -  basic  percentile; 
AN  -  asymptotic  normal;  Cov  -  coverage;  Len  -  length 
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Table  B.9:  Coverage  probability  and  length  for  95%  parametric  CIs  around  0^  under  unequal  costs 
and  three  classes  with  a  normally  distributed  feature. 


Delta  GCI  BCa  BP  AN 


BCa 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cost\ 

10 

0.45 

91.54 

28.4 

97.23 

4.56 

89.86 

7.97 

94.54 

5.61 

91.88 

2.55 

0.31 

93.44 

0.93 

97.23 

1.50 

91.82 

1.35 

94.02 

1.35 

93.02 

1.03 

0.21 

92.86 

0.89 

97.17 

1.10 

93.18 

0.97 

93.76 

0.97 

92.96 

0.94 

0.14 

92.70 

0.93 

95.53 

1.07 

93.74 

1.02 

93.22 

1.00 

92.60 

1.00 

0.09 

92.66 

1.03 

96.40 

1.13 

93.84 

1.12 

93.10 

1.10 

92.42 

1.11 

50 

0.45 

94.30 

0.49 

95.73 

0.53 

93.22 

0.53 

93.56 

0.53 

93.32 

0.50 

0.31 

94.62 

0.40 

95.50 

0.42 

93.96 

0.40 

94.02 

0.39 

94.42 

0.40 

0.21 

94.66 

0.39 

94.90 

0.40 

94.46 

0.39 

94.68 

0.39 

94.58 

0.39 

0.14 

94.60 

0.41 

94.57 

0.42 

94.76 

0.42 

94.58 

0.41 

94.50 

0.42 

0.09 

94.52 

0.46 

95.23 

0.46 

94.64 

0.46 

94.36 

0.46 

94.20 

0.46 

100 

0.45 

94.88 

0.35 

94.90 

0.36 

93.54 

0.35 

93.40 

0.34 

93.82 

0.35 

0.31 

95.08 

0.28 

95.10 

0.29 

94.40 

0.28 

94.54 

0.28 

94.62 

0.28 

0.21 

95.00 

0.27 

95.20 

0.28 

94.48 

0.27 

94.72 

0.27 

94.74 

0.27 

0.14 

94.98 

0.29 

95.57 

0.29 

94.64 

0.29 

94.80 

0.29 

94.78 

0.29 

0.09 

95.20 

0.32 

94.87 

0.32 

95.12 

0.32 

95.08 

0.32 

95.12 

0.32 

250 

0.45 

94.86 

0.22 

94.47 

0.22 

94.44 

0.22 

94.56 

0.22 

94.76 

0.22 

0.31 

95.56 

0.18 

95.83 

0.18 

94.98 

0.18 

94.94 

0.18 

95.12 

0.18 

0.21 

95.42 

0.17 

95.30 

0.17 

94.90 

0.17 

95.04 

0.17 

95.24 

0.17 

0.14 

95.18 

0.18 

94.43 

0.18 

95.12 

0.18 

95.14 

0.18 

95.28 

0.18 

0.09 

94.84 

0.20 

95.17 

0.20 

94.92 

0.20 

94.96 

0.20 

94.96 

0.20 

Co  St 2 

10 

0.89 

92.40 

21.6 

98.03 

2.95 

87.92 

3.64 

95.64 

3.43 

93.98 

2.17 

0.66 

93.22 

5.80 

97.43 

1.61 

91.66 

1.32 

94.66 

1.51 

93.64 

1.28 

0.46 

92.70 

1.71 

96.57 

1.20 

93.28 

1.07 

94.34 

1.16 

93.18 

1.03 

0.31 

92.64 

0.94 

96.00 

1.10 

93.90 

1.06 

93.94 

1.10 

92.74 

1.04 

0.27 

92.60 

1.03 

95.67 

1.14 

94.04 

1.14 

93.42 

1.15 

92.40 

1.12 

50 

0.89 

93.98 

0.58 

95.57 

0.63 

92.34 

0.58 

92.78 

0.58 

94.28 

0.58 

0.66 

94.68 

0.43 

96.57 

0.46 

93.56 

0.43 

93.92 

0.43 

94.58 

0.43 

0.46 

94.76 

0.39 

94.93 

0.41 

94.72 

0.40 

94.78 

0.40 

94.68 

0.40 

0.31 

94.58 

0.41 

94.73 

0.42 

94.72 

0.42 

94.66 

0.42 

94.42 

0.42 

0.27 

94.50 

0.46 

95.07 

0.46 

94.62 

0.46 

94.36 

0.46 

94.20 

0.46 

100 

0.89 

95.10 

0.41 

95.40 

0.42 

93.22 

0.41 

93.48 

0.40 

94.26 

0.40 

0.66 

94.84 

0.30 

95.20 

0.31 

94.10 

0.30 

94.40 

0.30 

94.58 

0.30 

0.46 

94.94 

0.28 

94.63 

0.28 

94.42 

0.28 

94.54 

0.28 

94.56 

0.28 

0.31 

95.00 

0.29 

95.53 

0.29 

94.68 

0.29 

94.84 

0.29 

94.76 

0.29 

0.27 

95.18 

0.32 

94.50 

0.32 

95.14 

0.32 

95.10 

0.32 

95.08 

0.32 

250 

0.89 

95.40 

0.26 

95.03 

0.26 

94.34 

0.26 

94.66 

0.26 

94.84 

0.26 

Continued  on  next  page 
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Table  B.9  -  continued  from  previous  page 


Uj  BC3 

Delta 

GCI 

BCa 

BP 

AN 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

Cov 

Len 

0.66 

95.52 

0.19 

94.87 

0.19 

94.70 

0.19 

94.86 

0.19 

94.92 

0.19 

0.46 

95.20 

0.18 

94.90 

0.18 

95.08 

0.18 

95.20 

0.18 

95.38 

0.18 

0.31 

95.12 

0.18 

95.47 

0.18 

95.20 

0.19 

95.20 

0.18 

95.22 

0.18 

0.27 

94.88 

0.20 

94.93 

0.20 

94.92 

0.20 

94.96 

0.20 

94.96 

0.20 

GCI  -  generalized  confidence  interval;  BCa  -  bias  corrected  and  accelerated;  BP  -  basic  percentile; 
AN  -  asymptotic  normal;  Cov  -  coverage;  Len  -  length 
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B.2  Additional  Cl  performance  results  from  Chapter  4 


Table  B.IO:  Simulation  coverage  probability  and  length  for  the  nonparametric  bootstrapped  95%  Cl 
around  BC2  for  two  classes  with  a  normally  distributed  feature  when  all  Ci\jPj  =  1,  for  /  7^  j  _ 


n\ 

«2 

BC2 

Coverage 

Length 

5 

5 

0.6 

90.30 

0.47 

0.4 

63.60 

0.35 

0.2 

0.00 

0.17 

0.1 

0.00 

0.06 

6 

9 

0.6 

91.47 

0.45 

0.4 

84.90 

0.35 

0.2 

0.00 

0.18 

0.1 

0.00 

0.07 

10 

10 

0.6 

92.93 

0.48 

0.4 

94.80 

0.39 

0.2 

7.40 

0.22 

0.1 

0.00 

0.10 

12 

18 

0.6 

90.27 

0.42 

0.4 

93.63 

0.35 

0.2 

73.47 

0.22 

0.1 

0.00 

0.11 

20 

20 

0.6 

91.33 

0.42 

0.4 

94.83 

0.35 

0.2 

90.5 

0.24 

0.1 

0.00 

0.13 

22 

28 

0.6 

87.10 

0.35 

0.4 

91.97 

0.30 

0.2 

91.60 

0.20 

0.1 

1.63 

0.12 

30 

30 

0.6 

92.47 

0.37 

0.4 

95.43 

0.32 

0.2 

95.37 

0.22 

0.1 

69.70 

0.14 
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Appendix  C:  R  Code 
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C.1  R  Code 

C.1.1  Delta  Method  95%  CIs  . 

I  #IMPUTS  TO  CHANGE## 

pl<-#SET  Prevelance  Class  1 

3  p2<-#SET  Prevelance  Class  2 

p3<-#SET  Prevelance  Class  3 

5  w21<-#SET  COST  2|1 
w31<-#SET  COST  3|1 
wl2<-#SET  COST  1|2 
w32<-#SET  COST  3|2 
9  wl3<-#SET  COST  1|3 
w23<-#SET  COST  2|3 

II  s  t  art  <-c  (  - .  1  ,0) 

L<-c(- 1000, -1000) 

13  U<-c(1000 ,1000) 
nx<-#SIZE  Class  1 
15  ny<-#SIZE  Class  2 
nz<-#SIZE  Class  3 

17  X<-#Vector  of  Values  for  Class  1 
Y<-#Vector  of  Values  for  Class  2 
19  Z<-#Vector  of  Values  for  Class  3 
gmul<-mean  (X) 

21  gmu2<-mean  (Y) 
gmu3<-mean  (Z) 

23  gsigl<-sd(X) 
gsig2<-sd(Y) 

25  gsig3<-sd(Z) 

f<-f  unc  tion  (  par  )  j  ( pnorm  (par  [  2  ]  ,  gmul  ,  gsig  1  )-pnorm  (  par  [  1  ]  ,  gmul  ,  gsigl  ))*(pl  *w21  )  + 
27  ( 1  -  pnorm  (  par  [  2  ]  ,  gmul  ,  gsigl  ))*(pl  >i;w31 )  + 

( pnorm  (  par  [  1  ]  ,  gmu2  ,  gsig2))=K(p2>Hwl2)  + 

29  ( 1  -  pnorm  (  par  [  2  ]  ,  gmu2  ,gsig2))*(  p2*w32)  + 

( pnorm  (  par  [  1  ]  ,  gmu3  ,  gsig3  )  )=k(p3h<w13)  + 

31  ( pnorm (  par  [2]  ,  gmu3  ,  gsig 3  ) -pnorm (  par  [  1  ]  ,  gmu3  ,  gsig3  ) )  *  ( p3*w23 )  ) 
x<-nlminb ( start  ,  f,  lower  =  L,  upper  =  U) 

33  c  1  <-x$ par  [  1  ] 
c2<-x$par [2] 

35  EBC<-x$  obj  ec  ti  ve 

##ESTIMATE  PARTIALS  EOR  THETA 

37  g<-function(par)j(  pnorm  (par  [  2  ]  ,  mux ,  sigx  ) -pnorm  (  par  [  1  ]  ,  mux ,  sigx))*(pl  *w21 )  + 

( 1  -  pnorm  (  par  [  2  ]  ,  mux ,  sigx))*(plN<w31)  + 

39  ( pnorm  (  par  [  1  ]  ,  muy ,  sigy))*(p2*wl2)  + 

( 1  -  pnorm  (  par  [  2  ]  ,  muy ,  sigy  ) )  *  ( p2>Hw32)  + 

41  ( pnorm  (  par  [  1  ]  ,  muz  ,  sigz))*(p3*wl3)  + 

( pnorm  (  par  [  2  ]  ,  muz  ,  s  igz  ) -pnorm  (  par  [  1  ]  ,  muz  ,  sigz))*(p3  *w23 )  ) 

43  #Partial  for  Theta  1  &  2  wrt  Mean  1 
#start  with  +eppsilon 
45  mux<-gmul  +  .000 1 
muy<-gmu2 
47  muz<-gmu3 
sigx<-gsigl 
49  sigy<-gsig2 
sigz<-gsig3 

51  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olp<-x$par [ 1 ] 

53  o2p<-x$par  [2] 

#now  -  eppsilon 
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55  mux<-gmul  -  .000 1 
muy<-gmu2 
57  muz<-gmu3 
sigx<-gsigl 
59  sigy<-gsig2 
sigz<-gsig3 

61  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olm<-x$par [ 1 ] 

63  o2m<-x$par  [2] 

#Calc  Partial 
65  dclml<-(olp-olm)  /  .0002 
dc2ml<-( o2p-o2m) / . 0002 
67  #Partial  for  Theta  1  &  2  wrt  Mean  2 
#start  with  +eppsilon 
69  mux<-gmul 

muy<-  gmu2  +  .0001 
71  muz<-gmu3 
sigx<-gsigl 
73  sigy<-gsig2 
sigz<-gsig3 

75  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olp<-x$par [ 1 ] 

77  o2p<-x$par  [2] 

#now  -  eppsilon 
79  mux<-gmul 

muy<-  gmu2  -.0001 
81  muz<-gmu3 
sigx<-gsigl 
83  sigy<-gsig2 
sigz<-gsig3 

85  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olm<-x$par [ 1 ] 

87  o2m<-x$par  [2] 

#Calc  Partial 
89  dclm2<-(  olp-olm)  /  .0002 
dc2m2<-( o2p-o2m) / . 0002 
91  #Partial  for  Theta  1  &  2  wrt  Mean  3 
#start  with  +eppsilon 
93  mux<-gmul 
muy<-gmu2 
95  muz<-gmu3  +  .000 1 
sigx<-gsigl 
97  sigy<-gsig2 
sigz<-gsig3 

99  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olp<-x$par [ 1 ] 

101  o2p<-x$par  [2] 

#now  -  eppsilon 
103  mux<-gmul 
muy<-gmu2 
105  muz<-gmu3  -  .000 1 
sigx<-gsigl 
107  sigy<-gsig2 
sigz<-gsig3 

109  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olm<-x$par [ 1 ] 

111  o2m<-x$par  [2] 
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#Calc  Partial 
113  dclm3<-( olp-olm)  /  . 0002 
dc2m3<-( o2p-o2m) / .0002 
115  #Partial  for  Theta  1  &  2  wrt  Sigma  1 
#start  with  +eppsilon 
117  mux<-gmul 
muy<-gmu2 
lit)  muz<-gmu3 

sigx<-gsigl  +.0001 
121  sigy<-gsig2 
sigz<-gsig3 

123  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olp<-x$par [ 1 ] 

125  o2p<-x$par  [2] 

#now  -  eppsilon 
127  mux<-gmul 
muy<-gmu2 
129  muz<-gmu3 

sigx<-gsigl  -.0001 
131  sigy<-gsig2 
sigz<-gsig3 

133  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olm<-x$par [ 1 ] 

135  o2m<-x$par  [2] 

#Calc  Partial 
137  dc  1  s  1  <-( olp-olm) /. 0002 
dc2s  1  <-( o2p-o2m) / . 0002 
139  #Partial  for  Theta  1  &  2  wrt  Sigma  2 
#start  with  +eppsilon 
141  mux<-gmul 
muy<-gmu2 
143  muz<-gmu3 
sigx<-gsigl 
145  sigy<-gsig2 +.0001 
sigz<-gsig3 

147  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olp<-x$par [ 1 ] 

149  o2p<-x$par  [2] 

#now  -  eppsilon 
151  mux<-gmul 
muy<-gmu2 
153  muz<-gmu3 
sigx<-gsigl 
155  sigy<-gsig2  -.0001 
sigz<-gsig3 

157  x<-nlminb  (  Start  ,  g,  lower  =  L,  upper  =  U) 
olm<-x$par [ 1 ] 

159  o2m<-x$par  [2] 

#Calc  Partial 
161  dc  1  s2<-(  olp-olm) /.  0002 
dc2s2<-( o2p-o2m) / .0002 
163  #Partial  for  Theta  1  &  2  wrt  Sigma  3 
#start  with  +eppsilon 
165  mux<-gmul 
muy<-gmu2 
167  muz<-gmu3 
sigx<-gsigl 


178 


169  sigy<-gsig2 

sigz<-gsig3+.0001 

171  x<-nlminb  (  start  ,  g,  lower  =  L,  upper  =  U) 
olp<-x$par [ 1 ] 

173  o2p<-x$par  [2] 

#now  -  eppsilon 
175  mux<-gmul 
muy<-gmu2 
177  muz<-gmu3 
sigx<-gsigl 
179  sigy<-gsig2 

sigz<-gsig3  -.0001 

181  x<-nlminb  (  start  ,  g,  lower  =  L,  upper  =  U) 
olm<-x$par [ 1 ] 

183  o2m<-x$par  [2] 

#Calc  Partial 
185  dc  1  s3<-(  olp-olm)  /  . 0002 
dc2s3<-( o2p-o2m) / . 0002 

187  ##calc  partial  of  BC  function  wrt  Mean  1,  in  three  parts 

dpl<-(l  /  gsigl  )*((  dc2ml  - 1)  N<dnorm  ( (  c2-gmul  )/gsigl  )*  ( w21  *pl  -w31  *p  1  )-w21  *pl  *dnorm  ( ( 
c  1  -gmu l)/gsigl  )*(dclml-l)) 

189  dp2<-(l/gsig2)*  ( wl2*p2  *dnorm  ( (  cl  -gmu2)  /  gsig2  )  =i<dclml+w32!i!p2  *dnorm  ( ( gmu2-c2  )  /  gsig2 
)  (-dc2ml ) ) 

dp3<-(l  /  gsig3)*((dclml)  *dnorm  ((cl  -gmu3  )/gsig3)*  ( wl3*p3-w23*p3  )+w23*p3  H<dnorm  ( (  c2- 
gmu3 )  /  g  s i  g  3  )  * dc2m  1 ) 

191  dbcml<-dpl+dp2+dp3 

##calc  partial  of  BC  function  wrt  Mean  2,  in  three  parts 
193  dp  1<-(1 /  gsigl  )*((  dc2m2 )  s<dnorm  ( (  c2-gmul  )/gsigl)*  ( w21  *pl  -  w31  *pl  )-w21  s<pl  *dnorm  ( (  c  1  - 
gmu l)/gsigl  )*dcl m2 ) 

dp2<-(l  /  gsig2)*  ( wl2*p2  (sdnorm  ( (  cl  -gmu2)  /gsig2)=K(  del  m2  - 1  )+w32>Hp2=itdnorm  ( ( gmu2-c2 )  / 
gsig2)*(l  - dc2m2 ) ) 

195  dp3<-(l/gsig3)=i=(  dclm2=Kdnorm  ((cl  -gmu3  )/gsig3)*  ( wl3>Hp3-w23*p3  )+w23h<p3  ^dnorm  ( (  c2- 
gmu3 )  /  g  s  i  g  3  )  *  dc2m2 ) 
dbcm2<-dpl  +dp2+dp3 

197  ##calc  partial  of  BC  function  wrt  Mean  3,  in  three  parts 

dpl<-(l  /  gsigl  )*(  dc2m3  H<dnorm  ( (  c2-gmul  )/gsigl)*  ( w21  *pl-w31  *pl  )-w21  =Kpl  *dnorm  ( (  c  1  - 
gmu l)/gsigl  )*dclm3) 

199  dp2<-(l/gsig2)*  ( wl2*p2  ^dnorm  ( (  cl  -gmu2 )  /  gsig2  )  =Kdclm3+w32*p2  *dnorm  ( ( gmu2-c2  )  /  gsig2 
)  *(-dc2m3 ) ) 

dp3<-(l  /  gsig3)*((  del  m3  - 1 )  H<dnorm  ((cl  -gmu3  )/gsig3)!i!  ( wl3*p3-w23*p3  )+w23*p3  s<dnorm  ( ( 
c2  -gmu3  )/gsig3)*(  dc2m3  - 1 ) ) 

201  dbcm3<-dpl+dp2+dp3 

##calc  partial  of  BC  function  wrt  Sigma  1,  in  three  parts 
203  dpl<-(  1  /  gsig  1  )  Mt  ( dnorm  ( (  c2-gmul  (/gsigl)*  ( w21  *pl  *  ( dc2s  1  -  (( c2-gmul )  /  gsigl  )  )+w31  *pl  * 
(-dc2sl  -((gmul-c2)/ gsigl )) ) 

-w21 *pl * dnorm  ( ( cl -gmu  1 )/ gsigl )*(dclsl  -((cl -gmul )/gsigl ))) 

205  dp2<-(  1  /  gsig2  )  *  (w  12*  p2  *  dnorm  ( (  cl  -gmu2 )  /  gsig2)*dclsl  -w32*p2  *  dnorm  ( ( gmu2-c2  )  /  gsig2 
) *  dc2s 1  ) 

dp3<-(l/gsig3)*(  dnorm  ( (  cl  -gmu3  )/gsig3)*dclsl*  ( wl3*p3-w23*p3  )+w23*p3  *  dnorm  ( (  c2- 
gmu3 )/gsig3)*dc2sl) 

207  dbes  1  <-dpl +dp2+dp3 

##calc  partial  of  BC  function  wrt  Sigma  2,  in  three  parts 
209  dpl<-(  1  /  gsig  1  )  *  ( dnorm  ( (  c2-gmul  )/gsigl  )*dc2s2*  ( w21  *pl  -  w31  *pl  )-w21  *pl  *  dnorm  ( (  cl  - 
gmul )/gsigl  )*dcls2) 

dp2<-(l  /  gsig2)*  ( w  12*  p2  *  dnorm  ( (  cl  -gmu2 )  /  gsig2)*(dcls2  -  ((cl  -gmu2)  /  gsig2))  + 

211  w32*p2*  dnorm  ( ( gmu2-c2  )/gsig2)*(-dc2s2  -  ((gmu2-c2  )  /  gsig2))) 
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dp3<-(l  /  gsig3)=i=(  dnorm  ( (  cl  -gmu3  )/gsig3)=i=dcls2*  ( wl3>Hp3-w23*p3  )+w23h<p3  =i=dnorm  ( (  c2- 
gmu3 )/gsig3)*dc2s2) 

213  dbcs2<-dpl+dp2+dp3 

##calc  partial  of  BC  function  wrt  Sigma  3,  in  three  parts 
215  dpl<-(l/gsigl)*(  dnorm  ( (  c2-gmul  )/gsigl  )=itdc2s3*  ( w21  N<pl  -  w31  *pl  )-w21  =i<pl  *  dnorm  ( (  cl  - 
gmul )/gsigl )*dcls3) 

dp2<-(l  /  gsig2)=it  ( w  12*  p2  *  dnorm  ( (  cl  -gmu2  )/gsig2)*dc  1  s3  +w32*p2  *dnorm  ( ( gmu2-c2  )  /  gsig2 
)*(-dc2s3 ) ) 

217  dp3<-(  1  /  gsig3  )  *  ( dnorm  ( (  cl  -gmu3  )/gsig3)*(dcls3  -((cl  -gmu3  )/gsig3))*  ( wl3*p3-w23*p3  ) 
+ 

w23*p3  *  dnorm  ( (  c2-gmu3  )/gsig3)*(dc2s3  -((c2-gmu3 )  /  gsig3  )  ) ) 

219  dbcs3<-dpl+dp2+dp3 

#Calc  Variances  of  Parameters 
221  #var  of  mean 

vml<-( gsigl "2)/nx 
223  vm2<-(  gsig2  "2) /ny 
vm3<-( gsig3"2)/nz 
225  #var  of  sigma 

vsl<- (gsigl  '2)1  (2*(nx-l)) 

227  vs2<-(  gsig2  "2)  /  (2*(ny-l)) 
vs3<-(gsig3"2)/(2*(nz-l)) 

229  #Calc  Variance  of  Bayes  Cost 

VBC<-  ( dbcml  "2 )  *vml  +  ( dbcs  1  "2)*vsl  + 

231  ( dbcm2  "2)  *vm2  +  (  dbcs2'2)*vs2  + 

( dbcm3  "2)  *vm3  +  (  dbcs3  "2)*vs3 
233  #Calc  Variance  of  Threshold  1 

VCl<-(dclml  ''2)*vml  +  (dclsl  "2)*vsl  + 

235  ( del  m2  "2)  *vm2  +  (  dcls2"2)*vs2  + 

( del  m3 '2)  *vm3  +  (  dcls3  "2)*vs3 
237  #Calc  Variance  of  Threshold  2 

VC2<-  ( dc2ml  "2 )  *vml  +  ( dc2s  1  "2)*vsl  + 

239  ( dc2m2  "2)  *vm2  +  (  dc2s2'2)*vs2  + 

( dc2m3  "2)  *vm3  +  (  dc2s3  "2)*vs3 
241  ##CI  results 

LBCl<-cl  -1.96*  sqrt (VCl) 

243  UBCl<-cl +1.96*  sqrt  (VCl) 

LBC2<-c2 -1.96* sqrt (VC2) 

245  UBC2<-c2  +  1.96*  sqrt  (VC2) 

LBBC<-EBC-  1 . 96  *  s  q r  t (VBC) 

247  l]BBC<-EBC+ 1.96*  sqrt  (VBC) 

C.1.2  Generalized  95%  CIs  . 

I  #IMPUTS  TO  CHANGE## 

pl<-#SET  Prevelance  Class  1 

3  p2<-#SET  Prevelance  Class  2 

p3<-#SET  Prevelance  Class  3 

5  w21<-#SET  COST  2|1 
w31<-#SET  COST  3|1 
wl2<-#SET  COST  1|2 
w32<-#SET  COST  3|2 

9  wl3<-#SET  COST  1|3 
w23<-#SET  COST  2|3 

II  nx<-#SIZE  Class  1 
ny<-#SIZE  Class  2 

13  nz<-#SIZE  Class  3 

X<-#Vector  of  Values  for  Class  1 

15  Y<-#Vector  of  Values  for  Class  2 
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Z<-#Vector  of  Values  for  Class  3 
17  K<-1500  #Change  if  desire  K  other  than  1500 
##Calculations  ,  Do  not  change 
19  s  t  art  <-c  (  - .  1  ,0) 

L<-c(-1000,-1000) 

21  U<-c(  1000 ,1000) 
ybar  1  <-mean  (X) 

23  ybar2<-mean (Y) 
ybar3<-mean(Z) 

25  varl<-var(X) 
var2<- var  (Y) 

27  var3<-var(Z) 

##Create  Pivotal  Quantile  for  each  Mean,  and  Var 
29  tl<-rt  (K,nx-1) 
t2<-rt  (K,ny-1) 

31  t3<-rt  (K,  nz-1) 

Vl<-rchisq  (K,nx-1) 

33  V2<-rchi sq  (K,  ny  - 1 ) 

V3<-rchisq  (K,nz-1) 

35  Rsl<-c  (  rep  ((  nx- 1)*  varl  ,K))/V1 
Rs2<-c  (rep((ny-l)*var2  ,K))/V2 
37  Rs3<-c  (  rep  ((  nz  - 1)*  var3  ,K))/V3 

Rml<-c ( rep ( ybar 1  ,K) )  -( 1 1  * ( s  qrt ( var 1 /nx ) ) ) 

39  Rm2<-c  (  rep  (  ybar2  ,K))-(t2H!(sqrt(var2/ny))) 

Rm3<-c  (rep(ybar3  ,K))-(t3!i!(sqrt(var3/nz))) 

41  #Find  K  BC  and  Opt.  Threshold  values  using  Numerical  Minimization 
BC<-c( rep ( -9999, K)) 

43  Cl<-c(  rep  ( -9999, K)) 

C2<-c ( rep ( -9999, K)) 

45  for  ( i  in  1  :K)  { 

h<-f  unction  (  par  )  j  ( pnorm  (par  [  2  ]  ,Rml  [i],sqrt(Rsl[i])  )-pnorm  (par  [  1  ]  ,Rml  [i],sqrt(Rsl 
[i  ])  )  )*(pUw21)  + 

47  ( 1  -  pnorm  (par  [  2  ]  ,Rml  [i],sqrt(Rsl[i])))*(pl  *w31  )  + 

( pnorm (  par  [  1  ]  ,Rm2[  i  ]  ,  sqrt  (Rs2  [  i  ])  )  )>H(p2*wl2)  + 

49  ( 1  -  pnorm  (par  [2]  ,Rm2[  i],sqrt(Rs2[i])))*(  p2*w32)  + 

( pnorm  (  par  [  1  ]  ,Rm3  [  i],sqrt(Rs3[i])))>H(p3*wl3)  + 

51  ( pnorm  (par  [2]  ,Rm3  [  i],sqrt(Rs3[i])  ) -pnorm  (par  [  1  ]  ,Rm3  [  i],sqrt(Rs3[i])))H<(p3  >hw23  )  ) 
sols<-optim  (  start  ,  h,  lower  =  L,  upper  =  U,  method=”L-BFGS-B” ) 

53  BC [i]<-sols$value 
Cl[i]<-sols$par[l] 

55  C2  [  i  ]  <- s  ol  s  $  par  [  2] 

I 

57  #  Cl  Results 

LBC1<- quantile  (Cl  ,.02  5) 

59  UBCl<-quantile  (Cl  ,.975) 

LBC2<- quantile  (C2,.02  5) 

61  UBC2<-quantile  (C2,.975) 

LBBC<-quantile  (BC,.025) 

63  UBBC<-quantile  (BC,.975) 

C.1.3  Fiducial  95%  Cl  for  BC  with  Equal  Weights. 

1  ##INPUTS  for  Setup## 
nl<-#Sample  Size  Class  1 

3  n2<-#Sample  Size  Class  1 

n3<-#Sample  Size  Class  1 

5  BChat<-#Estimated  BC 
###Do  not  Change 
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7  g<-c(l  ,0 ,1  ,0 ,1  ,0) 

Umat<-c(l  /  nl  ,0 , 1  /  n2  ,0 , 1  /  n3  ,0) 

9  BChaK-round  (  BChat  ,  5  ) 
row  =  (nl  + 1 )  *2 

II  s  s  =  matrix  (  seq  (  from  =0 ,  to=row  - 1  ,by  =  1)  ,  ncol=2) 
ss  [,2]  =  nl-ss  [  ,1] 

13  ss  1  <-ss  [  ,  1  :  2] 
row  =  (n2  + 1 )  *2 

15  s  s  =  matrix  (  seq  (  from  =0 ,  to=row  - 1  ,by  =  1 )  ,  ncol=2) 
s s [ , 2]  =  n2-s s [  ,  1 ] 

17  ss2<-ss[,l:2] 
row  =  (n3  + 1 )  *2 

19  s  s  =  matrix  (  seq  (  from  =0 ,  to=row  - 1  ,by  =  1)  ,  ncol=2) 
ss  [,2]  =  n3-ss  [  ,1] 

21  ss3<-ss[,l:2] 

LEN<- lengthfssl  [  ,l])*length(ss2  [  ,l])*length(ss3  [  ,1]) 
23  lenl<-length(ssl[,l]) 

Ien2<-length(ss2[ ,1]) 

25  Ien3<-length(ss3[,l]) 
vl<-c(rep(l  ,len2=Klen3)) 

27  col  1  <-kronecker  (  ss  1  ,  vl ) 
v2<-c (rep(l ,lenl )) 

29  v3<-c  (  rep  ( 1  ,  len3  ) ) 

col2<-kronecker(v2,kronecker(ss2  ,v3)) 

31  v4<-c  (  rep  ( 1  ,  len  1  =K  len2  ) ) 
col3<-kronecker(v4, ss3) 

33  SS<-matrix(cbind(coll  ,col2,col3),ncol=6) 

U1<-SS  [  ,  1  :6]%*%Umat 
35  Ul<-round  (U1 , 5 ) 

SS<-cbind(SS,Ul) 

37  temp<-data  .  frame  ( SS ) 

SSOR<-temp  [  order  ( temp  [,  7  ] )  ,] 

39  ##CREATE  Probability  SAMPLE  SPACES 
##by  . 1 ,  SSPl 

41  pvec<-seq  ( from  =  0 ,  to  =  l,  by  =  .l) 
len<-length(pvec) 

43  vl<-c  (  rep  ( 1  ,  let!  *  len  )  ) 
col  1  <-kronecker ( pvec ,vl) 

45  v2<-c  (  rep  ( 1  ,  len  ) ) 

col2<-kronecker(v2 , kroneckerfpvec  ,v2) ) 

47  col3<-kronecker (v2 , kronecker (v2 , pvec) ) 
cl2<-c(l-coll  ) 

49  c22<-c(l  -  col2  ) 
c32<-c(l-col3  ) 

51  SSPl<-cbind(coll  ,cl2,col2,c22,col3  ,c32) 

##by  .05,  SSP2 

53  pvec<-seq  ( from  =  0 ,  to  =  l,  by  =  .05) 
len <- length (pvec) 

55  vl<-c  (  rep  ( 1  ,  len  *  len  )  ) 
col  1  <-kronecker ( pvec ,vl) 

57  v2<-c  (  rep  ( 1  ,  len  ) ) 

col2<-kronecker(v2 , kronecker(pvec  ,v2) ) 

59  col3<- kronecker (v2 , kronecker (v2 , pvec) ) 
cl2<-c(l-coll  ) 

61  c22<-c(l  -  col2  ) 
c32<-c(l-col3  ) 

63  SSP2<-cbind(coll  ,cl2,col2,c22,col3  ,c32) 
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##by  .01,  SSP3 

65  pvec<-seq  ( from  =  0 ,  to  =  l,  by  =  .01) 
len<-length(pvec) 

67  vl<-c  (  rep  ( 1  ,  len  *  len  )  ) 
col 1 <-kronecker ( pvec ,vl) 

69  v2<-c  (  rep  ( 1  ,  len  ) ) 

col2<-kronecker(v2,kronecker(pvec,v2)) 

71  col3<-kronecker(v2,kronecker(v2,pvec)) 
c 1 2<-c(l  -  col  1 ) 

73  c22<-c(l  -  col2  ) 
c32<-c(l-col3  ) 

75  SSP3<-cbind(coll  ,cl2,col2,c22,col3  ,c32) 

##by  .005,  SSP4 

77  pvec<-seq  ( from  =  0 ,  to  =  l,  by  =  .005) 
len<-length(pvec) 

79  vl<-c  (  rep  ( 1  ,  len  *  len  )  ) 
col 1 <-kronecker ( pvec ,vl) 

81  v2<-c  (  rep  ( 1  ,  len  ) ) 

col2<-kronecker(v2,kronecker(pvec,v2)) 

83  col3<-kronecker(v2,kronecker(v2,pvec)) 
c 1 2<-c(l  -  col  1 ) 

85  c22<-c(l  -  col2  ) 
c32<-c(l-col3  ) 

87  SSP4<-cbind  (coll  ,cl2,col2  ,c22,col3  ,c32) 
end<-length (SSOR[ ,1]) 

89  ##Define  Partial  CDFS 
fl<-function(p)j 

91  factorial  (nl):i<factorial  (n2)isfactorial  (n3)  stsum  ( ( (  p  [  1  ]  '  ( SSOR  [  ( UBound+ 1 ) :  end  ,  1  ] )  )  / 
factorial  (SSOR  [  ( UBound  + 1 )  :  end  ,  1  ] )  )  ( (  p  [  2  ]  "  ( SSOR  [  ( UBound  + 1 )  :  end  ,2])  )/ 
factorial  (SSOR [ ( UBound  + 1)  :  end  ,2])  )* 

((p  [3]"(SSOR[(UBound+l)  lend  ,3]))  / factorial  (SSOR  [  ( UBound+ 1 )  :  end  ,  3  ] )  )  *  ( ( p  [  4  ]  '  ( SSOR 
[(UBound+1)  lend  ,4])  (/factorial  (SSOR  [  ( UBound  + 1 )  i  end  ,4])  )* 

93  ((p  [5]''(SSOR[(UBound+l)  lend  ,  5  ] )  )  /  f  a  c  t  o  r  i  al  (SSOR  [  ( UBound+ 1 )  i  end  ,  5  ] )  )  *  ( ( p  [  6  ]  "  ( SSOR 
[(UBound+1)  lend  ,  6  ] )  )  /  f  a  c  t  o  r  i  al  (SSOR  [  ( UBound  + 1 )  i  end  ,6])  ) ) 

1 

95  f2<-function(p){ 

factorial  (nl)*factorial  (n2)>i;factorial  (n3)*  sum  ( ( (  p  [  1  ]  '  ( SSOR  [  LBound  i  UBound  ,  1  ] )  )  / 
factorial  ( SSOR  [  LBound  i  UBound  ,  1  ] )  )*((  p  [  2  ]'( SSOR[  LBound  i  UBound  ,  2  ] )  )/ factorial  ( 
SSOR  [  LBound  1  UBound,  2] )  )* 

97  ((p  [3]''(SSOR[LBoundiUBound,3])  (/factorial  ( SSOR  [  LBound  i  UBound  ,  3  ]  (  (  *  ( ( p  [  4  ]  "  ( SSOR[ 
LBound  1  UBound  ,4](  (  /  factorial  (SSOR  [LBound  i  UBound  ,  4  ]  (  (  * 

((p  [5]"(SSOR[LBoundiUBound,5](  (  /  factorial  ( SSOR  [  LBound  i  UBound  ,  5  ]  (  (  *  ( ( p  [  6  ]  "  ( SSOR[ 
LBound  iUBound,6](  (/factorial  (SSOR  [LBound  i  UBound  ,  6  ](( ( 

99  j 

BCmatch<-which  (SSOR[  ,7]  =  =  BChat  [  1  ]  ( 

101  LBound<-min  ( BCmatch  ( 

UBound<-max(  BCmatch  ( 

103  ##Find  Solution  1st  Iteration#### 

BCOUTR-apply  (SSPl  ,  1,  FUN  =  f  1  ( 

105  BCOUT2<-apply  (SSPl  ,  1,  FUN  =  f 2  ( 

BCL<-BCOLm+BCOUT2 

107  BCU<-c(rep(l  ,  length  (BCOUTl/ ( (-BCOUT1 
BC<-SSPl%*%g 
109  BC<-round  (BC,  5  ( 

blah2<-cbind(SSPl ,BCU,BC( 

111  BCcdf<-unique  (BC( 

BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf(  ( ( 
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for  (i  in  1 : length  ( BCcdf [, 1 ]))  { 

BCcdf[i  ,2]<-max(blah2[which(blah2[,8]  =  =  BCcdf[i  ,1])  ,7]) 

1 

#GET  UB 

temp<-max(BCcdf[  which  (BCcdf  [  ,2]  <  =  0.025)  ,2]) 

UB<-min(BCcdf[  which  (BCcdf  [, 2]  =  =  temp)  ,1]) 

#GET  LB 
BC<-SSPl%*%g 
BC<-round(BC,5) 
blah2<-cbind(SSPl ,BCL,BC) 

BCcdf<-unique  (BC) 

BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf) ) ) 
for  (i  in  1 : length ( BCcdf [, 1 ]))  { 

BCcdf[i  ,2]<-max(blah2[which(blah2[,8]  =  =  BCcdf[i  ,1])  ,7]) 

1 

temp<-max(BCcdf[  which  (BCcdf  [  ,2]  <  =  0.025)  ,2]) 

LB<-max(  BCcdf  [  which(BCcdf[,2]  =  =  temp  )  ,  1  ] ) 

##Refine  in  on  Solution  1st  time 

BC<-SSP2%*%g 

BC<-round(BC,5) 

SSPtemp<-cbind (SSP2 ,BC) 

SSPn<-SSP2[which(SSPtemp[,7]  <  =  (UB  + .  2)&SSPtemp  [  ,7]  >  =  (UB-.2)  )  ,] 
BCOUTl<-apply  (SSPn,  1,  FUN  =  f  1  ) 

BCU<-c(rep  (1  ,  length  (BCOUTl) )  )-BCOUTl 

BC<-SSPn%*%g 

BC<-round(BC,5) 

blah2<-cbind(SSPn ,BCU,BC) 

BCcdf<-unique  (BC) 

BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf) ) ) 
for  (i  in  1 : length  ( BCcdf [, 1 ]))  { 

BCcdf [i  ,2]<-max(blah2[which(blah2[,8]  =  =  BCcdf[i  ,1])  ,7]) 

I 

#GET  UB 

temp<-max(BCcdf[  which  (BCcdf  [  ,2]  <  =  0.025)  ,2]) 

UB<-min  (  BCcdf  [  which  ( BCcdf  [  ,2]  =  =  temp  )  ,  1  ] ) 

#GET  LB 

BC<-SSP2%*%g 

BC<-round(BC,5) 

SSPtemp<-cbind (SSP2 ,BC) 

SSPn<-SSP2[  which(SSPtemp[  ,7]  <  =  (LB  + . 2)&SSPtemp  [  ,7]  >  =  (LB-.2) )  ,] 
BCOUTR-apply  (SSPn,  1,  FUN  =  f  1  ) 

BCOUT2<-apply  (SSPn,  1,  FUN  =  f  2  ) 

BCL<-BCOLm+BCOUT2 
BC<-SSPn%*%g 
BC<-round(BC,5) 
blah2<-cbind(SSPn ,BCL,BC) 

BCcdf<-unique  (BC) 

BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf) ) ) 
for  (i  in  1 : length ( BCcdf [, 1 ]))  { 

BCcdf[i  ,2]<-max(blah2[which(blah2[,8]  =  =  BCcdf[i  ,1])  ,7]) 

1 

temp<-max(BCcdf[  which  (BCcdf  [  ,2]  <  =  0.025)  ,2]) 

LB<-max(  BCcdf  [  which(BCcdf[,2]  =  =  temp  )  ,  1  ] ) 

##Refine  in  on  Solution  2nd  time 

BC<-SSP3%*%g 

BC<-round(BC,5) 

SSPtemp<-cbind (SSP3 ,BC) 


184 


171 

173 

175 

177 

179 

181 

183 

185 

187 

189 

191 

193 

195 

197 

199 

201 

203 

205 

207 

209 

211 

213 

215 

217 

219 

221 

223 

225 


SSPn<-SSP3[  which(SSPtemp[,7]  <  =  (UB  + .  1  )&SSPtemp  [  ,7]  >  =  (UB-.l) )  ,] 
BCOUTl<-apply  (SSPn,  1,  FUN  =  f  1  ) 

BCXJ<-c  (  rep  ( 1  ,  length  (BCOUTl) )  )-BCOUTl 

BC<-SSPn%H=%g 

BC<-round(BC,5) 

blah2<-cbind  (SSPn  ,BCXJ,BC) 

BCcdf<-unique  (BC) 

BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf) ) ) 
for  (i  in  1 : length  ( BCcdf [, 1 ]))  { 

BCcdf[i  ,2]<-max(blah2  [which(blah2  [,8]  =  =  BCcdf[i  ,1])  ,7]) 

1 

#GET  UB 

temp<-max(BCcdf[  which  (BCcdf  [  ,2]  <  =  0.025)  ,2]) 

UB<-min(BCcdf[  which  (BCcdf  [, 2]  =  =  temp)  ,1]) 

#GET  LB 

BC<-SSP3%*%g 

BC<-round(BC,5) 

SSPtemp<-cbind  (SSP3  ,BC) 

SSPn<-SSP3[  which(SSPtemp[  ,7]  <  =  (LB  + .  1  )&SSPtemp  [  ,7]  >  =  (LB-.l) )  ,] 
BCOUTl<-apply  (SSPn,  1,  FUN  =  f  1  ) 

BCOUT2<-apply  (SSPn,  1,  FUN  =  f  2  ) 

BCL<-BCOLm+BCOUT2 
BC<-SSPn%H=%g 
BC<-round(BC,5) 
blah2<-cbind(SSPn ,BCL,BC) 

BCcdf<-unique  (BC) 

BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf) ) ) 
for  (i  in  1 : length  ( BCcdf [, 1 ]))  { 

BCcdf[i  ,2]<-max(blah2  [which(blah2  [,8]  =  =  BCcdf[i  ,1])  ,7]) 

I 

temp<-max(BCcdf[  which  (BCcdf  [  ,2]  <  =  0.025)  ,2]) 

LB<-max(  BCcdf  [  which(BCcdf[,2]  =  =  temp  )  ,  1  ] ) 

##Refine  in  on  Solution  3rd  time 

BC<-SSP4%*%g 

BC<-round(BC,5) 

SSPtemp<-cbind (SSP4 ,BC) 

SSPn<-SSP4[  which(SSPtemp[,7]  <  =  (UB  + .05  )&SSPtemp  [  ,7]  >  =  (UB-.05) )  ,] 
BCOUTR-apply  (SSPn,  1,  FUN  =  f  1  ) 

BCU<-c  (  rep  ( 1  ,  length  (BCOUTl) )  )-BCOUTl 

BC<-SSPn%H=%g 

BC<-round(BC,5) 

blah2<-cbind(SSPn ,BCU,BC) 

BCcdf<-unique  (BC) 

BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf) ) ) 
for  (i  in  1 : length  ( BCcdf [, 1 ]))  { 

BCcdf[i  ,2]<-max(blah2  [which(blah2  [,8]  =  =  BCcdf[i  ,1])  ,7]) 

1 

#GET  UB 

temp<-max(BCcdf[  which  (BCcdf  [  ,2]  <  =  0.025)  ,2]) 

UB<-min(BCcdf[  which  (BCcdf  [, 2]  =  =  temp)  ,1]) 

#GET  LB 

BC<-SSP4%*%g 

BC<-round(BC,5) 

SSPtemp<-cbind (SSP4 ,BC) 

SSPn<-SSP4[  which(SSPtemp[  ,7]  <  =  (LB  +  .05 )&SSPtemp  [  ,7]  >  =  (LB-.05) )  ,] 
BCOUTR-apply  (SSPn,  1,  FUN  =  f  1  ) 

BCOUT2<-apply  (SSPn,  1,  FUN  =  f  2  ) 


185 


227 

229 

231 

233 

235 

237 

239 

1 

3 

5 

7 

9 

II 

13 

15 

17 

19 

21 

23 

25 

27 

29 

31 

33 

35 

37 

39 

41 


BCL<-BCOUTl+BCOUT2 
BC<-SSPn%*%g 
BC<-round(BC,5) 
blah2<-cbind(SSPn ,BCL,BC) 

BCcdf<-unique  (BC) 

BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf) ) ) 
for  (i  in  1 : length ( BCcdf [, 1 ]))  { 

BCcdf [i  ,2]<-max(blah2[which(blah2[,8]  =  =  BCcdf[i  ,1])  ,7]) 

1 

templ<-max(BCcdf  [which  (BCcdf  [  ,2]  <  =  0.025)  ,2]) 
LB<-max(BCcdf[  which  (BCcdf  [, 2]  =  =  tempi)  ,1]) 

#CI  Results 
print (c(LB,UB) ) 

C.1.4  Fiducial  95%  Cl  for  BC  with  Unequal  Weights. 

#Inputs  to  Change 
pl<-#SET  Prevelance  Class  1 
p2<-#SET  Prevelance  Class  2 
p3<-#SET  Prevelance  Class  3 
w21<-#SET  COST  2|1 
w31<-#SET  COST  3|1 
wl2<-#SET  COST  1|2 
w32<-#SET  COST  3|2 
wl3<-#SET  COST  1|3 
w23<-#SET  COST  2|3 
nl<-#Sample  Size  Class  1 
n2<-#Sample  Size  Class  2 
n3<-#Sample  Size  Class  3 
BChat<-#Estimated  Bayes  Cost 

##CREATE  MULTINOMIAL  SAMPLESPACE  VIA  Weizhen  Wang  2012 

#Class  1  SS 

row  =  (nl  +  l)*(nl+2)/2*4 

ss  =  matrix  ( 1 :  row  ,  ncol=4) 

nn  =  l 

fn  =  l 

while  ( nn<n  + 1 +0.5)  { 

low  =  fn-nn  +  l 

up=fn 

s  s  [  low  :  up  ,  1  ]  =  n+l-nn 
uu=up-low 
s  s  [  low  :  up  ,  2  ]  =  0 :  uu 
nn=nn+l 
fn  =  fn+nn 
1 

SS  [  ,3]  =  n-ss  [  ,1]  -  SS  [  ,2] 
ssl<-ss  [  ,1:3] 

#Class  2  SS 

row  =  (n2  +  l)*(n2  +  2)/2*4 

ss  =  matrix  ( 1 :  row  ,  ncol=4) 

nn  =  l 

fn  =  l 

while  ( nn<n  + 1 +0.5)  { 

low  =  fn-nn  +  l 

up=fn 

s s  [  low  :  up  ,  1  ]  =  n+l-nn 

uu=up-low 

s  s  [  low  :  up  ,  2  ]  =  0 :  uu 
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43  nn=nn  +  l 
fn  =  fn+nn 

45  } 

ss[,3]  =  n-ss[,l]-ss[,2] 

47  ss2<-ss[,l:3] 

#Class  3  SS 

49  row  =  (n3  + 1 )  *  ( n3 +2)  /  2*4 
ss  =  matrix  ( 1 :  row  ,  ncol=4) 

51  nn  =  l 
fn  =  l 

53  while  (nn<n  + 1 +0.5)  { 
low  =  fn-nn  +  l 
55  up=fn 

s  s  [  low  :  up  ,  1  ]  =  n+l-nn 
57  uu=up-low 

s  s  [  low  :  up  ,  2  ]  =  0 :  uu 
59  nn=nn  +  l 
fn  =  fn+nn 
61  ) 

ss[,3]  =  n-ss[,l]-ss[,2] 

63  ss3<-ss[,l:3] 

LEN<- length(ssl  [  ,l])*length(ss2  [  ,l])*length(ss3  [  ,1]) 

65  lenl<-length(ssl[,l]) 

Ien2<-length(ss2[ ,1]) 

67  Ien3<-length(ss3[,l]) 
vl<-c(rep(l ,len2*len3)) 

69  col  1  <-kronecker  (  SS  1  ,  vl ) 
v2<-c ( rep ( 1 , len 1 ) ) 

71  v3<-c  (  rep  ( 1  ,  len3  ) ) 

col2<-kronecker(v2,kronecker(ss2  ,v3)) 

73  v4<-c  (  rep  ( 1  ,  len  1  *  len2  ) ) 
col3<-kronecker(v4, ss3) 

75  SS<-matrix(cbind(coll  ,col2,col3),ncol=9) 

Umat<-c  (0  ,  ( pi  *w21 )  /  nl  ,  (  pi  *w31  )/nl  ,0  ,(p2*wl2)/n2,(  p2*w32)  /n2 ,0  ,(p3*wl3)/n3  ,(p3* 
w23)/n3) 

77  U1<-SS  [  ,  1  :  9]%*%Umat 
Ul<-round(Ul,5) 

79  SS<-cbind(SS,Ul) 

##Order  BC  sample  Space 
81  temp<-data  .  frame  ( SS ) 

SSOR<-temp  [  order  ( temp  [,  1  0  ] )  ,] 

83  end<-length(SSOR[  ,1]) 

##by  .05,  SSP3 

85  pvec<-seq  ( from  =  0 ,  to  =  l,  by  =  .05) 
len<-length(pvec) 

87  vl<-c  (  rep  ( 1  ,  len  ) ) 

col 1 <-kronecker ( pvec ,vl) 

89  col2<-kronecker  ( vl  ,  pvec  ) 
col3<-l-coll  -col2 
91  Ps<-cbind  (  col  1  ,  col2  ,  col3  ) 

Pspace<-Ps  [- which  ( Ps  [  ,3]  <0)  ,] 

93  rowp<-length  (  Pspace  [  ,  1  ] ) 
vl<-c(rep(l , rowp*rowp ) ) 

95  col  1  <-kronecker  (  Pspace  ,  vl  ) 
v2<-c ( rep ( 1 , rowp ) ) 

97  col2<-kronecker(v2,kronecker(Pspace ,v2)) 
col3<-kronecker(v2,kronecker(v2, Pspace)) 
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99  SSPl<-matrix(cbind(coll  ,col2,col3),ncol=9) 

g<-c  (0  ,  ( pi  *w21 )  ,(pl*w31)  ,0,(p2>Hwl2)  ,(p2*w32)  ,0,(p3h<w13)  ,(p3*w23)) 

101  fl<-function(p)j 

factorial  (nl)*factorial  (n2)isfactorial  (n3)  *sum  ( ( (  p  [  1  ]  '  ( SSOR  [  1  :  (  LBound- 1 )  ,  1  ] )  )  / 
f  a  c  t  o  r  i  a  1  ( SSOR  [  1  :  (  LBound  -1)  ,l]))*((p[2]"(  SSOR  [  1  :  ( LBound  -  l),2]))/factorial( 
SSOR[  1  :  (LBound -1)  ,2]))^= 

103  ((p[3]"(SSOR[l:(LBound-l)  ,  3  ]))/ fa  c  t  o  r  i  al  (SSOR  [  1  :( LBound- 1 )  ,  3  ] )  )  ( (  p  [  4  ]  "  ( SSOR  [  1  :  ( 
LBound- 1)  ,4])  )/  factorial  (SSOR  [  1  :(  LBound- 1)  ,4])  )h< 

((p[5]''(SSOR[l:(LBound-l)  ,  5  ] )  )  /  f  a  c  t  o  r  i  a  1  (SSOR  [  1  :  ( LBound- 1 )  ,  5  ] )  )  *  ( (  p  [  6  ]  "  ( SSOR  [  1  :  ( 
LBound- 1)  ,6])  )/  factorial  (SSOR  [  1  :(  LBound- 1 )  ,6])  )* 

105  ((p  [7]''(SSOR[  1  :(LBound-l)  ,  7  ] )  )  /  f  a  c  t  o  r  i  al  (SSOR[  1  :  (LBound- 1)  ,  7  ] )  )  *  ( (  p  [  8  ]  "  ( SSOR  [  1  :  ( 
LBound- 1)  ,8])  )/  factorial  (SSOR  [  1  :(  LBound- 1)  ,8])  )* 

((p[9]"(SSOR[l:(LBound-l)  , 9 ]))/ fa c t o r i al  (SSOR [  1  :( LBound- 1 )  ,9]))) 

107  j 

f2<-function(p)j 

109  factorial  (nl)*factorial  (n2)isfactorial  (n3)  *sum  ( ( (  p  [  1  ]  '  ( SSOR  [  LBound  :  UBound  ,  1  ] )  )  / 

factorial  (SSOR[LBound  :  UBound  ,  1  ]))*((  p  [  2  ]"( SSOR[  LBound  :  UBound  ,  2  ] )  )  /  factorial  ( 
SSOR  [LBound:  UBound,  2] )  )h< 

((p  [3]"(SSOR[LBound:UBound,3])  )  /  factorial  ( SSOR  [  LBound  :  UBound  ,  3  ] )  )  *  ( ( p  [  4  ]  "  ( SSOR[ 
LBound  :  UBound  ,4]))  / factorial  (SSOR  [LBound  :  UBound  ,  4  ] )  )  * 

111  ((p  [5]''(SSOR[LBound:UBound,5])  (/factorial  ( SSOR  [  LBound  :  UBound  ,  5  ] )  )  *  ( ( p  [  6  ]  "  ( SSOR[ 
LBound  :  UBound  ,6])  )  /  factorial  (SSOR  [LBound  :  UBound  ,  6  ] )  )  * 

((p  [7]''(SSOR[LBound:UBound,7])  (/factorial  ( SSOR  [  LBound  :  UBound  ,  7  ]  (  (  *  ( ( p  [  8  ]  "  ( SSOR[ 
LBound  :  UBound  ,8]((  / factorial  (SSOR  [LBound  :  UBound  ,  8  ]  (  (  * 

113  ((p  [9]''(SSOR[LBound:UBound,9](  (  /  factorial  ( SSOR  [  LBound  :  UBound  ,  9  ]  (  ( ( 

I 

115  BCmatch<-which  (SSOR[  ,10]  =  =  BChat  [  1  ]( 

LBound<-min  ( BCmatch  ( 

117  UBound<-max(  BCmatch  ( 

BCOUTl<-apply  (SSPl  ,  1,  FUN  =  f  1  ( 

119  BCOUT2<-apply  (SSPl  ,  1,  FUN  =  f 2  ( 

BCU<-BCOUTl+BCOUT2 

121  BCL<-c(rep(l  ,  length  (BCOUTli  ( (-BCOUT1 
BC<-SSPl%*%g 
123  BC<-round(BC,5( 

blah2<-cbind(SSPl ,BCU,BC( 

125  BCcdf<-unique  (BC( 

BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf(  ( ( 

127  for  (i  in  1 :  length  ( BCcdf  [,  1  ](( ( 

BCcdf[i  ,2]<-max(blah2[which(blah2[,ll]  =  =  BCcdf[i  ,1](  ,10]( 

129  j 

BCcdf <- data  .  frame(BCcdf( 

131  BCcdf<-BCcdf[order(BCcdf  [  ,1](  ,] 

BCcdfs<-BCcdf 

133  blah<-length  ( BCcdfs  [  ,  1  ]  ( 
for  ( i  in  1 : blah  (  ( 

135  BCcdfs[i  ,2]<-max(BCcdf[which(BCcdf[,l]>  =  BCcdfs[i  ,  1](  ,2]( 

I 

137  BCcdf<-BCcdfs 
#GET  UB 

139  temp<-max(  BCcdf  [  which  ( BCcdf  [  ,2]  <  =  0.025  (  ,2]( 

UB<-min (  BCcdf  [  which  ( BCcdf  [  ,2]  =  =  temp  (  ,  1  ]( 

141  #GET  LB 

BC<-SSPl%*%g 
143  BC<-round(BC,5( 

blah2<-cbind(SSPl ,BCL,BC( 

145  BCcdf<-unique  (BC( 
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BCcdf<-cbind(BCcdf,  rep  (-999,  length  (BCcdf) ) ) 

147  for  (i  in  1 :  length  ( BCcdf  [,  1  ]))  { 

BCcdf[i  ,2]<-max(blah2[which(blah2[,ll]  =  =  BCcdf[i  ,1])  ,10]) 

149  ) 

BCcdf <- data  .  frame  ( BCcdf) 

151  BCcdf<-BCcdf[order(BCcdf  [  ,1])  ,] 

BCcdfs<-BCcdf 

153  blah<-length  ( BCcdfs  [  ,  1  ] ) 
for  ( i  in  1 : blah  )  { 

155  BCcdfs[i  ,2]<-max(BCcdf[which(BCcdf[,l]<  =  BCcdfs[i  ,1])  ,2]) 

1 

157  BCcdf<-BCcdfs 

temp<-max(BCcdf[  which  (BCcdf  [  ,2]  <  =  0.025)  ,2]) 

159  LB<-max( BCcdf  [  which  ( BCcdf  [  ,2]  =  =  temp  )  ,1]) 

#CI  Results 
161  p  r  i  n  t  ( c  (LB  ,UB) ) 

C.1.5  Delta  Method  Hypothesis  Tests  . 

C. 1.5.1  One-Sided  Test  on  Single  BC  Value. 

1  #Set  Up 

pl<-#SET  Prevelance  Class  1 
3  p2<-#SET  Prevelance  Class  2 
p3<-#SET  Prevelance  Class  3 
5  w21<-#SET  COST  2|1 
w31<-#SET  COST  3|1 
7  wl2<-#SET  COST  1|2 
w32<-#SET  COST  3|2 
9  wl3<-#SET  COST  1|3 
w23<-#SET  COST  2|3 
II  TV<-#  Set  BCnot 
nl<-#SIZE  Class  1 


13  n2<-#SIZE  Class  2 
n3<-#SIZE  Class  3 


15  Y<-#Vector 

of 

V  alues 

for 

Class 

1 

X<-#Vector 

of 

V  alues 

for 

Class 

2 

17  Z<-#Vector 

of 

V  alues 

for 

Class 

3 

s  t  art <-c ( - . 1 ,0) 

19  L<-c(-1000,-1000) 

U<-c(1000 ,1000) 

21  ##Do  Not  Change 
gmu  1  <-mean  ( Y) 

23  gmu2<-mean  (X) 
gmu3<-mean  (Z) 

25  gsigl<-sd(Y) 
gsig2<-sd(X) 

27  gsig3<-sd(Z) 

f<-f  unc  tion  (par)jabs(  pnorm  (par  [2  ]  ,  gmul  ,  gsig  1  )-pnorm  (  par  [  1  ]  ,  gmul  ,  gsigl  ))*(pl  ii!w21 ) 
+ 

29  abs  (1  -  pnorm  (par  [  2  ]  ,  gmul  ,  gsigl  ))*(pl  >hw31  )  + 
abs  ( pnorm  (par  [  1  ]  ,  gmu2  ,gsig2))4!(p2*wl2)  + 

31  abs  (1  -  pnorm  (par  [  2  ]  ,  gmu2  ,  gsig2))*(  p2h<w32)  + 
abs  ( pnorm  (par  [  1  ]  ,  gmu3  ,  gsig3  )  )*(p3*wl3)  + 

33  abs  ( pnorm  (par  [2  ]  ,  gmu3  ,  gsig3  ) -pnorm  (  par  [  1  ]  ,  gmu3  ,gsig3))>H(p3  *w23 )  ) 
x<-nlminb ( start  ,  f,  lower  =  L,  upper  =  U) 

35  c  1  <-x$ par  [  1  ] 
c2<-x$par [2] 

37  EBC<-x$  obj  ec  ti  ve 
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#Calculate  All  Partial  Derivatives  as  was  done  for  Delta  Method  Cl# 

39  #Calc  Variances  of  Parameters 
vml<-( gsigl "2)/nl 
41  vm2<-(  gsig2  "2) /n2 
vm3<-( gsig3"2)/n3 

43  #Calc  var  of  sig  using  delta  method 
vsl<-(gsigl"2)/(2*(nl-l)) 

45  vs2<- (  gsig2  "2 )  /  (2  *  ( nl  - 1 ) ) 
vs3<-(gsig3"2)/(2*(nl-l)) 

47  VBC<-  ( dbcml  "2 )  *vml  +  ( dbcs  1  "2)*vsl  + 

( dbcm2  "2)  >Hvm2  +  (  dbcs2"2)H<vs2  + 

49  ( dbcm3  "2)  *vm3  +  ( dbcs3  *2)  >H  vs3 

W<-(EBC-TV) / sqrt (VBC) 

51  #Test  p-value  -  to  compare  to  alpha 
deltap<-pnorm(W,  lo wer  .  t  a i  1  =TRUE) 

C. 1.5.2  One-Sided  Test  on  the  Difference  of  Two  Independent  BC  Values. 

pl<-#SET  Prevelance  Class  1 

:  p2<-#SET  Prevelance  Class  2 

p3<-#SET  Prevelance  Class  3 

4  w21<-#SET  COST  2|1 
w31<-#SET  COST  3|1 
6  wl2<-#SET  COST  1|2 
w32<-#SET  COST  3|2 
8  wl3<-#SET  COST  1|3 
w23<-#SET  COST  2|3 
10  TV<-#Set  BCnot 
nl<-#SIZE  Class  1 
12  n2<-#SIZE  Class  2 
n3<-#SIZE  Class  3 
14  TV<-#Set  Eta_not 


YA<-#Vector 

of 

Values 

for 

Class  1- C 1  a s s i f  i  c a t i o n 

System  A 

16 

XA<-#Vector 

of 

Values 

for 

Class  2- C 1  a s s i f i c a t i o n 

System  A 

ZA<-#Vector 

of 

Values 

for 

Class  3- C 1  a s s i f i c a t i o n 

System  A 

18 

Y<-#Vector 

of 

V  alues 

for 

Class  1- C 1  a s s i f i c a t i 0 n 

System  B 

X<-#Vector 

of 

V  alues 

for 

Class  2- C 1  a s s i f  i  c a t i 0 n 

System  B 

20 

Z<-#Vector 

of 

V  alues 

for 

Class  3- C 1  a s s i f i c a t i 0 n 

System  B 

##Do  Not  Change 
22  #CS  A 

gmul<-mean  (YA) 

24  gmu2<-mean  (XA) 
gmu3<-mean  (ZA) 

26  gsig  1  <-sd  (YA) 
gsig2<-sd  (XA) 

28  gsig3<-sd  (ZA) 

f<-function(par){  abs(  pnorm  (par  [2]  ,  gmul  ,  gsigl  )-pnorm  (  par  [  1  ]  ,  gmul  ,  gsigl  ))*(pl  *w21 ) 
+ 

30  abs  ( 1  -  pnorm  (par  [  2  ]  ,  gmul  ,  gsigl  ))*(pl*w31)  + 
abs  ( pnorm  (par  [  1  ]  ,  gmu2  ,  gsig2))*(p2*wl2)  + 

32  abs  ( 1  -  pnorm  (par  [  2  ]  ,  gmu2  ,gsig2))*(  p2>Hw32)  + 
abs  ( pnorm  (  par  [  1  ]  ,  gmu3  ,  gsig3  )  )4!(p3=i=wl3)  + 

34  abs  ( pnorm  (  par  [2]  ,  gmu3  ,  gsig3  ) -pnorm  (  par  [  1  ]  ,  gmu3  ,gsig3))*(p3  *w23 )  ) 
x<-optim  (  start  ,  f,  lower  =  L,  upper  =  U,  method=”L-BFGS-B” ) 

35  c  1  <-x$ par  [  1  ] 
c2<-x$par [2] 

38  EBCA<-x$ value 

#Calculate  all  Partial  Derivatives  for  CS  A  as  was  done  for  Delta  Cl 
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40  #Calc  Variances  of  Parameters 
vml<-( gsigl "2)/nl 
42  vm2<-(  gsig2  "2) /n2 
vm3<-( gsig3"2)/n3 

44  #Calc  var  of  sig  using  delta  method 
vsl<-(gsigl"2)/(2*(nl-l)) 

46  vs2<- (  gsig2  "2 )  /  (2  *  ( nl  - 1 ) ) 
vs3<-(gsig3"2)/(2*(nl-l)) 

48  VBCA<-(  dbcml  '2)  *vml  +( dbcs  1  ''2)4!vsl  + 

( dbcm2  "2)  H<vm2  +  (  dbcs2"2)H<vs2  + 

50  ( dbcm3  "2)  >Hvm3  +  (  dbcs3  "2)  *  vs3 

#CS  B 

52  gmul<-mean (Y) 
gmu2<-mean  (X) 

54  gmu3<-mean (Z) 
gsigl<-sd(Y) 

55  gsig2<-sd(X) 
gsig3<-sd(Z) 

58  f<-function(par){abs(  pnorm  (par  [2]  ,  gmul  ,  gsigl  )-pnorm  (  par  [  1  ]  ,  gmul  ,  gsigl  ))!i=(pl  *w21 ) 
+ 

abs  ( 1  -  pnorm  (par  [  2  ]  ,  gmul  ,  gsigl  ))*(pl  h<w31  )  + 

60  abs  ( pnorm  (par  [  1  ]  ,  gmu2  ,  gsig2))!i!(p2*wl2)  + 
abs  ( I  -  pnorm  (par  [  2  ]  ,  gmu2  ,gsig2))*(  p2n<w32)  + 

62  abs  ( pnorm  (par  [  1  ]  ,  gmu3  ,  gsig3  )  )4!(p3=i=wl3)  + 

abs  ( pnorm  (  par  [2]  ,  gmu3  ,  gsig3  ) -pnorm  (  par  [  1  ]  ,  gmu3  ,gsig3))*(p3  *w23 )  ) 

64  x<-optim  (  start  ,  f,  lower  =  L,  upper  =  U,  method=”L-BFGS-B” ) 
cl<-x$par [ 1 ] 

65  c2<-x$par  [2] 

EBC<-x$  value 

68  #Calculate  all  Partial  Derivatives  for  CS  B  as  was  done  for  Delta  Cl 
#Calc  Variances  of  Parameters 
70  vml<-(  gsig  1  "2) /nl 
vm2<-( gsig2"2)/n2 
72  vm3<-(  gsig3  "2) /n3 

#Calc  var  of  sig  using  delta  method 
74  vsl<-(gsigl"2)/(2*(nl-l)) 
vs2<-( gsig2 "2) / (2*(nl-l)) 

76  vs3<-(gsig3"2)/(2*(nl-l)) 

VBC<-  ( dbcml  "2 )  *vml  +  ( dbcs  1  "2)*vsl  + 

78  ( dbcm2  "2)  >Hvm2  +  ( dbcs2  "2)  *  vs2  + 

( dbcm3  "2)  H<vm3  +  (  dbcs3  "2)h<vs3 
80  VETA<-VBCA+VBC 
EETA<-EBCA-EBC 
82  W<-(EETA-TV)  /  sqrt  (VETA) 

#Test  p-value  -  to  compare  to  alpha 
84  del t ap <-pnorm (W,  lo wer  .  t  a i  1  =FALSE) 

C.1.6  Generalized  Hypothesis  Tests  . 

C. 1.6.1  One-Sided  Test  on  Single  BC  Value. 

1  #Set  Up 

pl<-#SET  Prevelance  Class  1 
3  p2<-#SET  Prevelance  Class  2 
p3<-#SET  Prevelance  Class  3 
5  w21<-#SET  COST  2|1 
w31<-#SET  COST  3|1 
-  wl2<-#SET  COST  1|2 
w32<-#SET  COST  3|2 
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9  wl3<-#SET  COST  1|3 
w23<-#SET  COST  2|3 
II  TV<-#Set  BCnot 
nl<-#SIZE  Class  1 
13  n2<-#SIZE  Class  2 
n3<-#SIZE  Class  3 

15  Y<-#Vector  of  Values  for  Class  1 
X<-#Vector  of  Values  for  Class  2 
17  Z<-#Vector  of  Values  for  Class  3 

K<-2500  #Change  if  K  other  than  2500  is  desired 
19  s  t  art  <-c  (  - .  1  ,0) 

L<-c(- 1000, -1000) 

21  U<-c(  1000 ,1000) 

##Do  Not  Change 
23  ybar2<-mean (Y) 
ybar  1  <-mean  (X) 

25  ybar3<-mean(Z) 
var2<- var  (Y) 

27  varl<-var(X) 
var3  <- var (Z) 

29  tl<-rt  (K,n2-1) 
t2<-rt (K,nl  -1) 

31  t3<-rt  (K,n3-1) 

Vl<-rchisq  (K,n2-1) 

33  V2<-rchisq  (K,  nl -1) 

V3<-rchisq  (K,n3-1) 

35  Rsl<-c  (  rep  (( n2- 1)*  varl  ,K))/V1 
Rs2<-c  (rep((nl-l)!i!var2  ,K))/V2 
37  Rs3<-c(rep  ((n3-l)*var3  ,K))/V3 

Rml<-c (rep(ybarl  ,K))-(tl*( sqrtfvarl /n2))) 

39  Rm2<-c  (rep(ybar2,K))-(t2*(sqrt(var2/nl))) 

Rm3<-c  (rep(ybar3  ,K))-(t3!i!(sqrt(var3/n3))) 

41  f<-function(x){ 

hun2<-function(par){abs(  pnorm  (par[2]  ,x[l]  ,x[2])  -pnorm  (par[l]  ,x[l]  ,x[2]))!i!(pl  m<  w21 
)  + 

43  abs  ( 1  -  pnorm  (par[2]  ,x[l]  ,x[2]))>H(pl  *w31 )  + 
abs ( pnorm (par[l]  ,x[3]  ,x[4]))*(p2*wl2)  + 

45  abs (1  -  pnorm (par[2]  ,x[3]  ,x[4]))>h( p2 *w32)  + 
abs  ( pnorm  (par[l]  ,x[5]  ,x[6]))=i=(p3*wl3)  + 

47  abs  ( pnorm  (par  [2]  ,x[5]  ,x[6])  -pnorm  (par[l]  ,x[5]  ,x[6]))=i=(p3  *w23 )  ) 
y<-optim  ( start  ,  hun2 ,  lower  =  L,  upper  =  U,  method=”L-BFGS-B” ) 

49  BC<-y$ value 
return  (BC) 

51  I 

ap  l<-cbind  (Rm2,  sqrt(Rs2)  ,Rml ,  sqrt(Rsl)  ,Rm3 ,  sqrt(Rs3)) 

53  RBC<-apply  (  apl  ,  1,  FUN=f) 

#Test  p-value  -  to  compare  to  alpha 
55  genp<-length  (  which  (RBOTV) ) /K 

C. 1.6.2  One-Sided  Test  on  the  Difference  of  Two  Independent  BC  Values. 

1  pl<-#SET  Prevelance  Class  1 

p2<-#SET  Prevelance  Class  2 

3  p3<-#SET  Prevelance  Class  3 

w21<-#SET  COST  2|1 
5  w31<-#SET  COST  3|1 
wl2<-#SET  COST  1|2 
7  w32<-#SET  COST  3|2 
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wl3<-#SET  COST  1|3 
9  w23<-#SET  COST  2|3 
TV<-#Set  BCnot 
II  nl<-#SIZE  Class  1 
n2<-#SIZE  Class  2 
13  n3<-#SIZE  Class  3 
TV<-#Set  Eta_not 


15 

YA<-#Vector 

of 

Values 

for 

Class  1- C 1  a s s i f i c a t i o n 

System  A 

XA<-#Vector 

of 

Values 

for 

Class  2- C 1  a s s 1 f i c a t i o n 

System  A 

17 

ZA<-#Vector 

of 

Values 

for 

Class  3- C 1  a s s 1 f 1 c a t i o n 

System  A 

Y<-#Vector 

of 

V  alues 

for 

Class  1 - C 1  a s s i f  1  c a t i 0 n 

System  B 

19 

X<-#Vector 

of 

V  alues 

for 

Class  2- C 1  a s s i f 1 c a t i 0 n 

System  B 

Z<-#Vector 

of 

V  alues 

for 

Class  3- C 1  a s s i f  1  c a t i 0 n 

System  B 

21  K<-2500  #  Change  if  desire  K  other  than  2500 
##Do  Not  Change 


23  yhar2<-mean (YA) 
yhar  1  <-mean  (XA) 

25  yhar3<-mean(ZA) 
var2<-var  (YA) 

27  var  1  <- var  (XA) 
var3<-var (ZA) 

29  tl<-rt  (K,n2-1) 
t2<-rt (K,nl  -1) 

31  t3<-rt  (K,n3-1) 

Vl<-rchisq  (K,n2-1) 

33  V2<-rchisq  (K,  nl -1) 

V3<-rchisq  (K,n3-1) 

35  Rsl<-c  (  rep  (( n2- 1)*  varl  ,K))/V1 
Rs2<-c  (rep((nl-l)*var2  ,K))/V2 
37  Rs3<-c(rep  ((n3-l)*var3  ,K))/V3 

Rml<-c (rep(yharl  ,K))-(tl*( sqrt (varl /n2))) 

39  Rm2<-c  (rep(yhar2,K))-(t2*(sqrt(var2/nl))) 

Rm3<-c  (rep(yhar3  ,K))-(t3!i!(sqrt(var3/n3))) 

41  f<-function(x){ 

hun2<-function(par)j  ahs(  pnorm  (par[2]  ,x[l]  ,x[2])  -pnorm  (par[l]  ,x[l]  ,x[2]))*(pl  *  w21 
)  + 

43  abs  (1  -  pnorm  (par[2]  ,x[l]  ,x[2]))>H(pl  *w31 )  + 
abs ( pnorm (par[l]  ,x[3]  ,x[4]))*(p2*wl2)  + 

45  abs ( 1  -  pnorm (par[2]  ,x[3]  ,x[4]))h<( p2 *w32)  + 
abs ( pnorm (par[l]  ,x[5]  ,x[6]))*(p3*wl3)  + 

47  abs ( pnorm  (par [2]  ,x[5]  ,x[6]) -pnorm  (par[l]  ,x[5]  ,x[6]))*(p3  *w23 ) ) 
y<-optim  ( start  ,  hun2 ,  lower  =  L,  upper  =  U,  method=”L-BFGS-B” ) 

49  BC<-y$ value 
return  (BC) 

51  1 

apl<-cbind  (Rm2,  sqrt  (Rs2 )  ,Rml ,  sqrt  (Rsl )  ,Rm3,  s  qrt  (Rs3  ) ) 

53  RbcA<- apply  (apl  ,  1,  FUN=f) 

ybar2<-mean  (Y) 

55  ybar  1  <-mean (X) 
ybar3<-mean(Z) 

57  var2<-var(Y) 
var  1  <- var (X) 

59  var3<-var(Z) 
tl<-rt  (K,n2-1) 

61  t2<-rt  (K,nl -1) 
t3<-rt  (K,n3-1) 

63  Vl<-rchisq  (K,n2-1) 
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V2<-rchisq  (K,nl  -1) 

65  V3<-rchisq  (K,n3-1) 

Rsl<-c(rep  ((n2-l)*varl  ,K))/V1 
67  Rs2<-c  (  rep  ( ( nl  - 1 )  *  var2  ,K)  )  /  V2 
Rs3<-c(rep  ((n3-l)*var3  ,K))/V3 
69  Rml<-c  (  rep  (  ybarl  ,K) ) -( 1 1  *  (  s  qr  t  (  var  1  /  n2 ) ) ) 

Rm2<-c (rep(ybar2,K))-(t2*(sqrt(var2/nl))) 

71  Rm3<-c  (rep(ybar3  ,K))-(t3!i!(sqrt(var3/n3))) 

ap  1  <-cbind  (Rm2,  sqrt(Rs2)  ,Rml ,  sqrt(Rsl)  ,Rm3 ,  sqrt(Rs3)) 

73  Rbc<-apply  (  apl  ,  1,  PTJN=f) 

Reta<-RbcA-Rbc 

75  #Test  p-value  -  to  compare  to  alpha 
genp<-length  (  which  (  Reta<TV) )  /K 

C.1.7  Exact  Hypothesis  Tests  . 

C. 1.7.1  One-Sided  Test  on  Single  BC  Value. 

#Inputs 

:  BC0<-#set  BC_not 

nl<-#Sample  Size  Class  1 

4  n2<-#Sample  Size  Class  1 

n3<-#Sample  Size  Class  1 

5  BChat<-#Estimated  BC 
pl<-#SET  Prevelance  Class  1 

8  p2<-#SET  Prevelance  Class  2 
p3<-#SET  Prevelance  Class  3 
10  w21<-#SET  COST  2|1 
w31<-#SET  COST  3|1 
12  wl2<-#SET  COST  1|2 
w32<-#SET  COST  3|2 
14  wl3<-#SET  COST  1|3 
w23<-#SET  COST  2|3 

16  ##Creat  SSOR  as  done  in  Fiducial  Cl  code 
#Create  Probability  Space 
18  pvec<-seq  ( from  =  0 ,  to  =  l,  by  =  .05) 
len<-length(pvec) 

20  vl<-c  (  rep  ( 1  ,  len  ) ) 

col  1  <-kronecker ( pvec ,vl) 

22  col2<-kronecker  ( vl  ,  pvec  ) 
col3<-l-coll  -col2 
24  Ps<-cbind(coll  ,col2,col3) 

Pspace3<-Ps[-which(Ps  [  ,3]  <0)  ,] 

26  collb<-pvec 
c 12<-l-pvec 

28  Pspace2<-cbind  (  col  1  b  ,  c  1 2  ) 
rowp<-length ( Pspace3 [ ,1]) 

30  vl<-c  (  rep  ( 1  ,  rowp=i<rowp ) ) 

col 1 <-kronecker ( Pspace2  ,vl) 

32  rowb<-length  (  Pspace2  [  ,  1  ] ) 
v2<-c ( rep ( 1 , rowp ) ) 

34  v3<-c  (  rep  ( 1  ,  rowb ) ) 

col2<-kronecker(v3,kronecker(Pspace3  ,v2)) 

36  col3<-kronecker(v3,kronecker(v2,Pspace3)) 

SSP4<- matrix  (cbind(col2  ,coll  ,col3)  ,ncol=8) 

38  end<-length  (SSOR[  ,  1  ]) 

g<-c  (0  ,  ( pi  *w21 )  ,(pl4!w31)  ,0,(p2*wl2)  ,(p2*w32)  ,0,(p3h<w13)  ,(p3*w23)) 
40  BC<-SSP4%*%g 
BC<-round(BC,5) 
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42  SSPtemp<-cbind(SSP4,BC) 

SSPn<-SSP4[  which(SSPtemp[  ,9]  >  =  (BCO) )  ,] 

44  BC<-SSPn%H=%g 
BC<-round(BC,5) 

46  fl<-function(p){ 

factorial  (nl)*factorial  (n2)>i;factorial  (n3)  *sum  ( ( (  p  [  1  ]  '  ( SSOR  [  1  :  ( UBound)  ,  1  ] )  )  / 

factorial  (SSOR  [  1  :  (UBound)  ,  1  ] )  )  *  ( (  p  [  2  ]  "  ( SSOR  [  1  :  (UBound)  ,2]))/ factorial  (SSOR 
[l:(UBound)  ,2]))* 

48  ((p[3]''(SSOR[l:(UBound)  ,  3  ]))/ fa  c  t  o  ri  al  (SSOR  [  1  :( UBound )  ,  3  ] )  )  *  ( ( p  [4  ]  '  ( SSOR  [  1  :  ( 
UBound)  ,4])  (/factorial  (SSOR[  1  :(  UBound)  ,4])  )* 

((p  [5  ]''(SSOR[l:  (UBound)  ,  5  ]))/ fa  c  t  o  ri  al  (SSOR  [  1  :( UBound )  ,  5  ] )  )  *  ( ( p  [  6  ] '' ( SSOR  [  1  :  ( 
UBound)  ,6])  (/factorial  (SSOR[  1  :(  UBound)  ,6])  )* 

50  ((p  [7]"(SSOR[  1  :(UBound)  ,7])  (/factorial  (SSOR  [  1  :  (UBound)  ,  7  ] )  )  *  ( ( p  [  8  ]  '  ( SSOR  [  1  :  ( 
UBound)  ,8])  (/factorial  (SSOR[  1  :(  UBound)  ,8])  ) ) 

1 

52  BCmatch<-which  (SSOR[  ,9]  =  =  BChat  [  1  ] ) 

UBound<-max(  BCmatch ) 

54  BCOUTl<-apply  (SSPn,  1,  FUN  =  f  1  ) 
evall<-BCOUTl 
56  BCOUT<-cbind(evall  ,BC) 

#Test  p-value  -  to  compare  to  alpha 
58  p<-max(BCOUT[which(BCOUT[,2]>  =  BCO)  ,1]) 

C. 1.7.2  One-Sided  Test  on  the  Difference  of  Two  Independent  BC  Values,  Equal 
Weights  Only. 

#Inputs 

2  nla<-#Sample  Size  Class  1  -  CS  A 

n2a<-#Sample  Size  Class  2  -  CS  A 

4  n3a<-#Sample  Size  Class  3  -  CS  A 

nlb<-#Sample  Size  Class  1  -  CS  B 

6  n2b<-#Sample  Size  Class  2  -  CS  B 

n3b<-#Sample  Size  Class  3  -  CS  B 

8  Etahat<-#Estimated  BC 
TV<-#Set  Eta_not 
10  ##Do  Not  Change 

##Create  Sample  Space 
12  row  =  (n  1  a  + 1) *2 

ss  =  matrix(seq(  from  =  0 ,  to=row  -  l,by  =  l),  ncol=2) 

14  ss[,2]  =  nla-ss[,l] 
ssl<-ss [  ,1:2] 

16  row  =  (n2a  + 1 ) *2 

ss  =  matrix(seq(  from  =  0 ,  to=row  -  l,by  =  l),  ncol=2) 

18  s s  [  ,2]  =  n2a-ss  [  ,  1  ] 
ss2<-ss[,l:2] 

20  row  =  (n3a  + 1) h<2 

ss  =  matrix(seq(  from  =  0 ,  to=row  -  l,by  =  l),  ncol=2) 

22  s s  [ , 2]  =  n3a- ss  [  ,  1  ] 
ss3<-ss[,l:2] 

24  LEN<- length (ssl  [  ,l])*length(ss2  [  ,l])*length(ss3 [  ,1]) 
lenl<-length(ssl [ ,1]) 

26  Ien2<-length(ss2[,l]) 

Ien3<-length(ss3 [ ,1]) 

28  vl<-c  (  rep  ( 1  ,  len2  *  len3  ) ) 
col 1 <-kronecker ( ss  1  ,vl) 

30  v2<-c  (  rep  ( 1  ,  len  1  ) ) 
v3<-c (rep(l ,len3)) 

32  col2<-kronecker(v2,kronecker(ss2  ,v3)) 
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v4<-c(rep(l  ,lenl=Klen2)) 

34  col3<-kronecker  ( v4  ,  ss3  ) 

SSl<-matrix(cbind(coll  ,col2  ,col3),ncol=6) 

35  row  =  (nlb  + 1 ) *2 

ss  =  matrix(seq(  from  =  0 ,  to=row  -  l,by  =  l),  ncol=2) 

38  ss[,2]  =  nlb-ss[,l] 
ssl<-ss [  ,1:2] 

40  row  =  (n2b  + 1) n<2  ##Counting  how  many  ways  to  order?? 

ss  =  matrix(seq(  from  =  0 ,  to=row  -  l,by  =  l),  ncol=2) 

42  s s  [ , 2]  =  n2b- ss  [  ,  1  ] 
ss2<-ss[,l:2] 

44  row  =  (n3b  + 1) *2  ##Counting  how  many  ways  to  order?? 

ss  =  matrix(seq(  from  =  0 ,  to=row  -  l,by  =  l),  ncol=2) 

46  s s  [  ,2]  =  n3b- ss  [  ,  1  ] 
ss3<-ss[,l:2] 

48  LEN<- lengthfssl  [  ,l])*length(ss2  [  ,l])*length(ss3 [  ,1]) 
lenl<-length(ssl [ ,1]) 

50  Ien2<-length(ss2[,l]) 

Ien3<-length(ss3 [ ,1]) 

52  vl<-c  (  rep  ( 1  ,  len2  =K  len3  ) ) 
col 1 <-kronecker ( ss  1  ,vl) 

54  v2<-c  (  rep  (1  ,  len  1  ) ) 
v3<-c (rep(l ,len3)) 

56  col2<-kronecker ( v2 , kronecker ( ss2  , v3 ) ) 
v4<-c(rep(l  ,lenl=Klen2)) 

58  col3<-kronecker  ( v4  ,  ss3  ) 

SS2<- matrix  (cbind(coll  ,col2  ,col3),ncol=6) 

60  ##Make  Joint  Space#### 
lenl<-length (SSI [ , 1 ]) 

62  len2<-length(SS2[  ,1]) 

LEN<- 1  e  n  1 1  e  n  2 

64  vl<-c  (  rep  ( 1  ,  len2  ) ) 
coll<-kronecker(SSl ,vl) 

65  v2<-c  (  rep  (1  ,  len  1  ) ) 
col2<-kronecker(v2 ,SS2) 

68  SS<-matrix(cbind(coll  ,col2),ncol  =  12) 

Umat<-c(l  /  nla,0 ,1  /  n2a  ,0 , 1  /  n3a  ,0  ,  -  1  /  nib  ,0  ,  -  1  /  n2b  ,0  ,  - 1  /  n3b  ,  0 ) 
70  U1<-SS  [  ,  1 :  1  2]%*%Umat 
Ul<-round(Ul,5) 

72  SS<-cbind(SS,Ul) 

##Order  Sample  Space 
74  temp<-data  .  frame  ( SS ) 

SSOR<-temp  [  order  ( temp  [,  1  3  ] )  ,] 

76  #Create  Prob  Space  to  Search 
pvec<-seq  ( from  =  0 ,  to  =  l,  by  =  .05) 

78  len<-length  (  pvec  ) 

vl<-c(rep(l  ,len*len)) 

80  col  1  <-kronecker  ( pvec  ,  vl ) 
v2<-c ( rep ( 1 , len ) ) 

82  col2<-kronecker(v2,kronecker(pvec ,v2)) 
col3<- kronecker (v2 , kronecker (v2 , pvec) ) 

84  c  1 2<-c(l  -  col  1  ) 
c22<-c(l  - col2 ) 

86  c32<-c(l  -  col3  ) 

SSPla<-cbind (coll  ,cl2,col2,c22,col3  ,c32) 

88  SSPlb<-SSPla 

lenl<-length(SSPla[ ,1]) 
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90  len2<-length(SSPlb[  ,1]) 

LEN<- 1  e  n  1  Mt  1  e  n  2 
92  vl<-c  (  rep  ( 1  ,  len2  ) ) 

coll<-kronecker(SSPla,vl) 

94  v2<-c  (  rep  (1  ,  len  1  ) ) 

col2<-kronecker(v2,SSPlb) 

96  SSP4<- matrix  (cbind(coll  ,col2)  ,ncol  =  12) 
g<-c(l  ,0,1  ,0,1  ,0,-1  ,0,-1  ,0,-1  ,0) 

98  BC<-SSP4%*%g 
BC<-round(BC,5) 

100  SSPtemp<-cbind(SSP4,BC) 

SSPna<-SSP4  [  which  ( SSPtemp  [ ,  1 3]  =  =TV)  , ] 

102  SSPn<-SSPna 
BC<-SSPn%*%g 

104  BC<-round  (BC,  5  ) 
fl<-function(p)j 

105  factorial  (nla)*factorial  (n2a)!i!factorial  (n3a)  *sum  ( ( (  p  [  1  ]  '  ( SSOR  [  1  :  (  LBound- 1 )  ,  1  ] )  )  / 

f  a  c  t  o  r  i  a  1  ( SSOR  [  1  :  (  LBound  -1)  ,1]))h=((p[2]"(  SSOR  [  1  :  ( LBound  -  l),2]))/factorial( 
SSOR[  1  :  (LBound -1)  ,2]))* 

((p[3]''(SSOR[l:  (LBound -1)  ,  3  ]))/ fa  c  t  o  r  i  al  (SSOR  [  1  :( LBound- 1 )  ,  3  ] )  )  ( (  p  [  4  ]  "  ( SSOR  [  1  :  ( 
LBound- 1)  ,4])  )/  factorial  (SSOR  [  1  :(  LBound- 1)  ,4])  )* 

108  ((p  [5]''(SSOR[  1  :(LBound-l)  ,  5  ] )  )  /  f  a  c  t  o  r  i  a  1  (SSOR[  1  :  (LBound- 1)  ,  5  ] )  )  *  ( ( p  [  6  ]  "  ( SSOR  [  1  :  ( 
LBound- 1)  ,6])  )/  factorial  (SSOR  [  1  :(  LBound- 1 )  ,6])  ) ) 
factorial  (nlb)=itfactorial  (n2b)H!factorial  (n3b)  *sum  ( ( (  p  [  7  ]  "  ( SSOR  [  1  :  (  LBound  - 1 )  ,  7  ] )  )  / 
factorial  (SSOR  [  1  :  (LBound-1)  ,  7  ] )  )  ( (  p  [  8  ]  "  ( SSOR  [  1  :  ( LBound  - 1 )  ,8]))/factorial( 
SSOR[  1  :  (LBound -1)  ,8]))^= 

110  ( (  p  [  9  ] ''  ( SSOR  [  1  :  (  LBound  -  l),9]))/factorial(  SSOR  [  1  :  ( LBound  -  l),9]))*((p[10]''(  SSOR 
[  1  :  (LBound- 1)  ,  1  0] )  )  /  f  ac  t  o  r  i  a  1  (SSOR  [  1  :(  LBound- 1 )  ,  10])  )* 

((p  [  1  l]"(SSOR[  1  :(LBound-l)  ,  1  1  ] )  )  /  f  a  c  t  o  r  i  al  (SSOR  [  1  :(  LBound- 1 )  ,1  1])  )*  ((p  [  1  2]  "(SSOR 
[  1  :  (LBound -1)  ,  1  2] )  )  /  f  a  c  t  o  r  i  a  1  (SSOR  [  1  :(  LBound- 1)  ,  12])  ) ) 

112  j 

Etamatch<-which(SSOR[,13]  =  =Etahat  [1]) 

114  LBound<-min  (  Etamatch  ) 

BCOUTl<-apply  (SSPn,  1,  FUN  =  f  1  ) 

115  evall<-l-BCOLm 
BCOUT<-cbind(evall  ,BC) 

118  #Test  p-value  -  to  compare  to  alpha 
p<-max(BCOUT[  which(BCOUT[,2]  >=TV)  ,1]) 
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